Although the article was based on structures in the current language of the
day (i.e., Java), the suggestions that the authors make would probably be
even more appropriate for a language like Lisp, due to the smaller size of
objects and greater frequency of pointer traversals. One of their main
points was that data structures should be compacted by using offsets instead
of full pointers and via data compaction so that more of them would stay in
cache. In addition, they believe that objects should be allocated together
so that objects likely to be traversed to would be brought into cache, as
well.
Oddly enough, this seems suspiciously close to some of the optimizations
that were made in Lisp machines. CDR-coding reduced the size of cons cells
and the garbage collectors of the day tried to place items thought to be
used together on the same memory page. Back then, the reason was to try to
reduce disk to memory paging, but a mismatch is a mismatch, eh?
If the data in this article (as well as the corresponding articles from
SIGPAN '99) is correct, spending a few cycles on decompressing a CDR might
make sense again today. That last analysis of CDR-coding I saw was from
about 10 years ago, when memory-CPU speeds were much more even. The
analysis then was that CDR-coding took more cycles to execute and caused
branch stalls, so it wasn't as good as directly coded addresses. Thinking
about (a) superscalar execution, (b) 6-to-10 cycle cache memory latencies,
and (c) larger instruction caches and branch tables, maybe CDR coding might
make sense again.
Even more importantly, perhaps one CDR-codes only the nursery of a
generational scavenger. This allows more objects to be created before GC is
necessary and, while aging objects are pushed out to older tranches,
translation into full-offset objects can be made on those objects that are
going to be around for a while. If enough cache is resident, the entire
nursery might be cached in, leading to very substantial speed increases.
So, again we see that the world simply re-invents Lisp but, more
importantly, one of the neatest hacks from the olden days might be
reasonable again. Opinions?
faa
> I just finished reading an artical in the Dec. 2000 issue of Computer
> Magazine called "Making Pointer-based Data Structures Cache Conscious", by
> Chilimbi, et.al.
[snip]
> Oddly enough, this seems suspiciously close to some of the optimizations
> that were made in Lisp machines. CDR-coding [...]
What does CDR-coding mean in this context? - To me, CDR is Clock Data
Recovery, and that's sure wrong here.
Kai
You can also encode those as small values and make sure that pointers
are always larger than those values, so you don't need extra bits.
> "CDR-coding" consists of stealing *another* two bits from the pointer
> representation. There are three possible values:
> - the next word of memory is the CDR, as usual
> - the next word of memory would be a pointer to NULL (the
> empty list), if it was there. i.e. the current element is
> the last in the list.
> - the next word of memory is the CAR of the next cons cell in
> the current list
The "next word is NULL" case is not strictly necessary so you
can reduce the number of required bits to just one.
> memory for contiguously-allocated lists. The cost in extra instructions
> to check for special-cases is quite high, especially when such a list is
> modified by inserting or deleting elements in the middle.
But such `setcdr' operations are not very frequent, or might
even not be allowed (in languages like ML).
Stefan
A less labour intensive rewrite is to make your code smaller (if you end up
with fewer cache misses, do you really care whether they're due to data or
instructions?). The latest major rewrite of Jaws (well, a few years old now)
was undertaken with just this aim in mind, and did show a noticeable
performance increase, despite having fewer unrolled loops etc.
Of course, writing compact code might mean having to eschew fashionable
techniques like ubiquitous object-orientation:-)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Ian Kemmish 18 Durham Close, Biggleswade, Beds SG18 8HZ, UK
i...@jawssytems.com Tel: +44 1767 601 361
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Behind every successful organisation stands one person who knows the secret
of how to keep the managers away from anything truly important.
But since instructions are mostly sequential, so the hardware can prefetch
the next block into the cache, so even larger code may have few cache
misses. With data, even if the hardware does prefetching, it will only be
helpful if the data exhibits high locality. This is true for array
structures, but when there's lots of pointers and indirection, as there is
in Lisp, it's only true if the GC keeps pointers together with the things
they point to, or if the hardware has a pointer-aware prefetcher (as
someone mentioned existed on LMI Lisp Machines).
--
Barry Margolin, bar...@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
A Lisp "cons cell" is a data record that contains two pointers called
(for wierd historical reasons) CAR and CDR. The most common use of them
is to build linked lists in which the CDR of each cell points to the
next element in the list and the CAR points to the data item at that
position in the list.
In fact for efficiency the "pointers" aren't always actually pointers.
In particular it's useful to be able to store an integer directly in the
CAR and/or CDR positions, rather than storing a pointer to some record
containing an integer. Therefore most Lisps arrange to reserve e.g. the
even values for actual pointers and the odd values for integers with
integers represented as e.g. 2N+1. On machines with compulsory
word-alignment for pointers this doesn't affect pointers at all, and
give you half the number of integer values you'd otherwise have (i.e. 31
bit on a 32 bit machine).
It's perhaps only slightly less common to want to store characters,
booleans, the empty list, empty array, zero length string etc directly
as well. This requires stealing another bit from the pointer
representation, forcing all pointers to be 4-byte aligned (on a byte
addressed machine).
Cons cells can be used to build any data structure, but simple
single-linked lists are very common, and it's fairly common to allocate
them all at once, which means that each cons cell ends up pointing at a
cell right next to it in memory.
"CDR-coding" consists of stealing *another* two bits from the pointer
representation. There are three possible values:
- the next word of memory is the CDR, as usual
- the next word of memory would be a pointer to NULL (the
empty list), if it was there. i.e. the current element is
the last in the list.
- the next word of memory is the CAR of the next cons cell in
the current list
You now have a total of four bits stolen from the pointer
representation, but as a result you might be able to save quite a lot of
memory for contiguously-allocated lists. The cost in extra instructions
to check for special-cases is quite high, especially when such a list is
modified by inserting or deleting elements in the middle.
-- Bruce
> You now have a total of four bits stolen from the pointer
> representation, but as a result you might be able to save quite a lot of
> memory for contiguously-allocated lists. The cost in extra instructions
> to check for special-cases is quite high, especially when such a list is
> modified by inserting or deleting elements in the middle.
Note that one such test that becomes nontrivial on General Purpose
hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
those extra bits must be masked off before the comparison. I don't
know if anyone has ever considered placing those extra bits "elsewhere",
e.g. in a parallel location to the actual pointer/tag word.
--
Duane Rettig Franz Inc. http://www.franz.com/ (www)
1995 University Ave Suite 275 Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
2. It would complicate the hardware. We had hardware that
`understood' lists and could do `prefetches' on them.
3. Only about 10% of the allocated storage in a running Lisp
machine was compressed cdr-coded lists.
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----
And with 64 bits to play with, who's to say we can't spare a couple for
CDR-coding?
In article <4k89cn...@beta.franz.com>,
Duane Rettig <du...@franz.com> wrote:
>Bruce Hoult <br...@hoult.org> writes:
>> You now have a total of four bits stolen from the pointer
>> representation, but as a result you might be able to save quite a lot of
>> memory for contiguously-allocated lists. The cost in extra instructions
>> to check for special-cases is quite high, especially when such a list is
>> modified by inserting or deleting elements in the middle.
>
>Note that one such test that becomes nontrivial on General Purpose
>hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
>those extra bits must be masked off before the comparison. I don't
>know if anyone has ever considered placing those extra bits "elsewhere",
>e.g. in a parallel location to the actual pointer/tag word.
Getting back to the original point of this thread, given superscalar
CPUs running in the GHz region, register-to-register or
immediate-to-register instructions are cheap (as long as they're in the
I-cache), and dereferencing pointers is *extremely* expensive (if the
targets are not in the D-cache).
Duane, I figure you would know this as well as anyone, so I have to ask:
What is the true cost of doing this masking compared to that of chasing
pointers into uncached RAM on a couple of well-known architectures,
e.g. x86, SPARC, or PowerPC? If it's done all the time I imagine the
instruction fetches would be damned near free.
-- Chuck, used-to-wannabe computer architect
--
Chuck Fry -- Jack of all trades, master of none
chu...@chucko.com (text only please) chu...@tsoft.com (MIME enabled)
Lisp bigot, car nut, photographer, sometime guitarist and mountain biker
The addresses above are real. All spammers will be reported to their ISPs.
> So, again we see that the world simply re-invents Lisp but, more
> importantly, one of the neatest hacks from the olden days might be
> reasonable again. Opinions?
I have an even more efficient and radical idea: Don't use lists.
They are almost always the wrong data structure. Use arrays.
Arrays are available and efficient in Lisp, Scheme, Java, C++,
Fortran, etc etc.
If you need variable-sized data structures, you should still use
arrays. Just double the size of the array when it becomes too small.
You double the array to make sure that amortized (average) cost of
insertions is constant. Even just after you've doubled the array and
so almost half the space is unused, you are not using any more space
than a linked list would take.
If it is important to be able to insert/delete in the middle of
a sequence, you might consider using a list. But consider a
buffer-gap array first. Instead of leaving the unused space at
teh end of the array, allow it to be in the middle, whereever
insertions/deletions are done. You need to "move the gap" (i.e.
copy things around in the data structure) when the gap is not where
you need to insert/delete, but most of the time insertions/deletions
will tend to be clustered (near each other).
A arguemnt could be made that lists are easy to use in a functional
(side-effect-free) programs - just use recursion. But I don't think
that is a strong argument. You can write clean effecient side-effect-free
code using arrays: Just look at APL and its successors. Modern
functional languages have "list comprehensions" - these can just as
easily be used for "array comprehensions".
I am talking about standard Lisp-style lists implemented using pairs.
There is a better case to be made for chaining objects together using
link slot in a more complex objects. At least in that case you're not
allocating a separate cons cell for each link.
--
--Per Bothner
p...@bothner.com http://www.bothner.com/~per/
Oh, Christ! We've been there, done that, about a million times.
| They are almost always the wrong data structure. Use arrays. Arrays
| are available and efficient in Lisp, Scheme, Java, C++, Fortran, etc
| etc.
Yes, so they are. Available. Arrays are also extremely inefficient
where lists are not the wrong data structure. Hence, use lists _and_
arrays, as appropriate. Lisp programmers are not the idiots you think
they are, Per. They figure out when it makes sense to use arrays even
if you don't and need to reduce the problem to silliness.
| If it is important to be able to insert/delete in the middle of
| a sequence, you might consider using a list.
If your data structures nest, use lists.
If your data structures have unpredictable structure, use lists.
If you need to maintain uniformity of type for the "rest" after
processing an initial portion of the data structure, use lists. (If
you think this only applies to recursion, which it seems you do, you
really must think more carefully about algorithm design with arrays
and why displaced arrays or passing _pairs_ (sorry, data structures in
small arrays, of course) of the array and the index won't cut it.)
If you need to manipulate the structure of the data structure in any
way, not just insert/delete, use lists. (Flattening, pairing, etc.)
If after much headache and painfully idiotic programming mistakes, you
discover that your array implementation really must use two elements
per "array", one data and one "next", you have reinvented the cons.
Congratulations! Welcome to the wisdom of 40 years of language design!
| I am talking about standard Lisp-style lists implemented using pairs.
| There is a better case to be made for chaining objects together using
| link slot in a more complex objects. At least in that case you're not
| allocating a separate cons cell for each link.
It is just insane to add "link" slots to more complex objects when you
can separate and abstract out the container concept from the containee.
Who cares about allocating separate cons cells? Simple arrays have to
be more than four elements long to take less space than the equivalent
list using cons cells. Specialized arrays may have to be much longer.
The allocation for cons cells is extremely efficiently implemented in
all Lisp implementations. Using your stupid arrays-for-lists would of
course force some super-clever implementation of exponent-of-2-sized
arrays that double in size as they are used _and_ move around all the
time (destroying object identity unless you waste even _more_ space on
their already inefficient implementation), but people have been there,
many times over, and they have _returned_ to lists, implemented via
cons cells, when that has been the most intelligent solution.
What we have left after so many years of using lists _and_ arrays
_and_ structures _and_ clos objects _and_ databases, etc, is that
there is not a single argument left for using anything _but_ lists
where lists are used. If you want to teach novices to use arrays, at
least don't make the misguided Scheme-style argument that alternatives
to the obvious and the most natural must be chosen because it somehow
more divine to use recursion, oops, I meant arrays.
You've done so much good work. How come you don't know simple things?
#:Erik
--
"When you are having a bad day and it seems like everybody is trying
to piss you off, remember that it takes 42 muscles to produce a
frown, but only 4 muscles to work the trigger of a good sniper rifle."
-- Unknown
> now) was undertaken with just this aim in mind, and did show a noticeable
> performance increase, despite having fewer unrolled loops etc.
Maybe not despite, but because of fewer unrolled loops. For exatly the
cache footprint reasons mentioned, I've heard people argue that these
days, unrolled loops are actually counter productive.
Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)
You mean like this?:
http://www.andansbutt.org/ttk/extant
There's a statement about efficiency of doubling array sizes
in the documentation that could have almost been a quote from
your paragraph above :-)
-- TTK
--
"Live and Learn; Die and Learn Faster"
(motto of the Lynbrook Historical Simulations club)
Technical and RKBA documents at: http://www.ciar.org/~ttk/
Except for a few CPU intensive applications, the flexibility of
lists far compensates for the loss of speed. STL vectors, for
example, are implemented with this doubling strategy, but they
also provide the resize() member function: if you have a
2,000,000 element vector you're going to need 8Mb just to add
one element to it!
And who would sacrifice the elegance of object orientation and
the expedience and breadth of STL to program by hand in plain
vanilla C using arrays ? Besides running benchmarks and the odd
go-for-broke scientific app, I see no reason why.
At some point in the past we evolved from writing in assembler
to writing in block-structured languages; we're in the process
of leaving BS for Object Orientation. Dynamic memory allocation
is of fundamental importance now, and we use it even when we
must give away some performance.
Alberto.
An interesting case study is symbolic algebra systems.
Of the 4 major systems:
Reduce = based on lisp ==> uses lists
Macsyma = based on lisp ==> uses lists
Mathematica = ?? written in its own list-based programming language ??
Maple = written in its own array-based programming language
The 3 "M" systems all offer roughly comparable functionality (Reduce
is a bit behind). But Maple is generally regarded as a factor of 3-5
faster, and anywhere from a factor of 2 to a factor of 10 smaller in
memory usage than the other two.
Maple uses contiguous arrays, or trees whose nodes are contiguous arrays,
for all its data structures. The authors state that this is one reason
(though certainly not the only one) for their performance advantages.
--
-- Jonathan Thornburg <jth...@thp.univie.ac.at>
http://www.thp.univie.ac.at/~jthorn/home.html
Universitaet Wien (Vienna, Austria) / Institut fuer Theoretische Physik
Q: Only 7 countries have the death penalty for children. Which are they?
A: Congo, Iran, Nigeria, Pakistan[*], Saudi Arabia, United States, Yemen
[*] Pakistan moved to end this in July 2000. -- Amnesty International,
http://www.web.amnesty.org/ai.nsf/index/AMR511392000
I implemented a similar idea for dynamic arrays but with the addition of
a "bump-size" parameter. The idea is to double the size of the array
until it hits the bump-size, say 4Kb, and thereafter only grow the array
in multiples of bump-size.
Of course the downside is to grow an 8Mb array still requires *another*
8Mb plus 4Kb to grow, assuming that realloc() will be unable to simply
resize the original array. But the original 8Mb will be freed.
But note that if the data is stored in a singly-linked list, it still
eats up 16Mb of memory - 8Mb for the data plus another 8Mb for each
pointer.
For really large arrays such as the 8Mb example, a combined approach of
a linked-list of smaller (say 16Kb) arrays would seem to be a more
manageable solution, for a modest increase in code-size.
But I digress, and no doubt bore to tears.
-- Parag Patel
Sounds like UNIX 'clists'.
--
`-_-' In hoc signo hack, Peter da Silva.
'U` "Milloin halasit viimeksi suttasi?"
Disclaimer: WWFD?
Alberto.
One technique I've used for threaded transactions with variable
numbers of entries of the same fixed length is to use dynamic
array sizing (with doubling) for allocation, with the list
entries as array entries. You can then use indices into the array
instead of pointers, and reduce the allocation overhead from
thousands of small chunks of memory to a dozen or so
reallocations of a single chunk, and a very simple custom subpool
allocator, that never reuses memory, until it can all be freed as
one chunk.
Of course, it's always better if you can compute an upper bound
on all the memory that may be required at the very start, and
allocate it all at one time, but that can only be done where you
know all the parameter values required to size the space at the
start of fairly simple transactions.
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
--
Brian_...@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
use address above to reply
The fun part is that this is exactly the approach which x86 forced us to
use back in the 16-bit days:
A list (or array) of arrays, with each subarray less than 64K in size.
I usually set the number of elements in each subarray such as to make it
between 32 and 64K in total size, and still a power of two, to simplify
addressing.
Since all accesses within the same subarray would be to the same
segment, the overhead of segment reloads could often/mostly be avoided,
at least for (semi-)sequential accesses.
Terje
--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"
> And with 64 bits to play with, who's to say we can't spare a couple for
> CDR-coding?
Let's not make this mistake again. When IBM (and Amdahl) created their XA
architectures for the 370 compatibles (extending the address range from 24
bits to 32 bits), several programs suddenly broke. These programs, which
used the upper 8 bits of address words for tags and flags, assumed (correctly,
but without foresight) that these bits were available for use. However,
whether these programs were justified in the original designs or not, it
would be safe for me to venture a guess that _none_ of them exist in their
previous design form, and that all have been either rewritten or scrapped.
But now, we're dealing with a different issue - one that makes it even harder
to justify using more address bits: 64-bit architectures tend to define their
entire virtual space. For example, the PA-RISC with the W bit set defines
text segments starting at 0x4000000000000000, and data segments starting at
0x8000000000000000. Which bits would you steal? Some bits in the middle?
Of course, lisps on GP hardware already use bits within the address space for
tags, but they tend to be only the least significant bits. There is indeed
a cost to this usage of bits, and that is the granularity of lisp object
addressing, which becomes increased to a two-word "allocation-unit" boundary
in the major lisps, which is the minimum size needed for a cons cell. This
increased granularity presents no problem for lisp implementations, and it
implies 3 bits of tag for 32-bit lisps, and 4 bits of tag for 64-bit lisps.
This means, all other things being equal, that there is only one extra bit
left for a decision to add cdr-coding. I don't think that one bit is enough;
I believe that you need two bits.
> In article <4k89cn...@beta.franz.com>,
> Duane Rettig <du...@franz.com> wrote:
> >Bruce Hoult <br...@hoult.org> writes:
> >> You now have a total of four bits stolen from the pointer
> >> representation, but as a result you might be able to save quite a lot of
> >> memory for contiguously-allocated lists. The cost in extra instructions
> >> to check for special-cases is quite high, especially when such a list is
> >> modified by inserting or deleting elements in the middle.
> >
> >Note that one such test that becomes nontrivial on General Purpose
> >hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
> >those extra bits must be masked off before the comparison. I don't
> >know if anyone has ever considered placing those extra bits "elsewhere",
> >e.g. in a parallel location to the actual pointer/tag word.
>
> Getting back to the original point of this thread, given superscalar
> CPUs running in the GHz region, register-to-register or
> immediate-to-register instructions are cheap (as long as they're in the
> I-cache), and dereferencing pointers is *extremely* expensive (if the
> targets are not in the D-cache).
I believe that EQ testing does speak directly to the point of cdr-coding;
if you measure the number of times CDR is taken vs the number of times
EQ is performed in a typical program, you find that EQ usage overwhelms
CDR usage. Thus, there is a strong impetus to optimize EQ testing
in preference to CDR operations.
Remember that any boolean test, e.g. "(when x ...)" is equivalent to
"(unless (eq x nil) ...)", in a similar manner that in C "if (x) {...}"
is equivalent to "if (x!=0) {...}", so there are many more EQ tests than
meets the eye in lisp code. If you use a bit which changes the value
of a pointer without changing its object's identity, then tests for
identity can no longer be performed in one instruction, and lisp then
loses to other languages...
> Duane, I figure you would know this as well as anyone, so I have to ask:
> What is the true cost of doing this masking compared to that of chasing
> pointers into uncached RAM on a couple of well-known architectures,
> e.g. x86, SPARC, or PowerPC? If it's done all the time I imagine the
> instruction fetches would be damned near free.
Well, obviously bit manipulation is faster than memory access, but why
are you asking this question? What do you have in mind?
And why uncached RAM, as opposed to cached data? If you're talking
about normal cons-cell lists vs cdr coding, shouldn't we assume a garbage
collector that can/does tend to keep list structures local with respect
to each other?
> Let's not make this mistake again. When IBM (and Amdahl) created their XA
> architectures for the 370 compatibles (extending the address range from 24
> bits to 32 bits), several programs suddenly broke. These programs, which
> used the upper 8 bits of address words for tags and flags, assumed (correctly,
> but without foresight) that these bits were available for use. However,
> whether these programs were justified in the original designs or not, it
> would be safe for me to venture a guess that _none_ of them exist in their
> previous design form, and that all have been either rewritten or scrapped.
Thanks for reminding me. Most of my stuff broke. But the mess was even
worse: they had to put in mode bits to allow older programs to run
unchanged. In a system like VM you ended up with two mode bits (EC/BC
and XA or something like that? I've forgotten most of this stuff and I
can't find my yellow booklet) and programs that only ran with one
combination of them. It was very nice to have eight bits to work with
with each pointer. Perhaps architecture designers should reconsider
giving user programs some bits in the address space.
--
Lieven Marchand <m...@bewoner.dma.be>
Lambda calculus - Call us a mad club
> It was very nice to have eight bits to work with
> with each pointer. Perhaps architecture designers should reconsider
> giving user programs some bits in the address space.
OK, let's play this request out: suppose that general purpose computer
manufacturer A decides to put in two extra bits besides the address, so
that instead of 64 bits of address, there are now 64 bits plus two extra
for tags/flags (two bits, eight bits, the story is still the same). Lisp
vendor L loves it, and designs some cdr-coding (or whatever else it wants
to do).
Now, in order to use those bits, they must be available per word, so
they must be included in the data path and memory. Thus, each data
word now has 64 bits for regular data plus these two extra bits for
L to use.
Now, C vendor (C, of course) and Java vendor J are getting on the
bandwagon of the newly popular A machines, but they notice that they
are not getting all they could be getting out of the architecture;
they notice that there are 66 bits of address width, 66 bits in the
data path, and a disconnect between the data and address in two of
the bits, and thus the address space is 1/4 the size that it could
be if those bits could be made use of. So C and L complain to A.
The A developers then set to work on the new A' (or should we call
it XA? :-) architecture which includes full functionality for the 65th
and 66th bit, it comes out and leaves L to reimplement its lisp in more
general terms or else be lost in the dust.
The moral of this is that whatever we ask of hardware vendors had better
be general-purpose enough to be useful in any language, or else it will
eventually fail.
A few of those 24-bit programs still exist, and they still run! Only those
programs that wanted to exploit the wonders of 31-bit mode were exposed, and
indeed those who coded without foresight were doomed to continue to play in
their little 24-bit corner. But they didn't break -- IBM takes backwards
compatibility seriously (for user-level programs, that is). They simply
couldn't automatically exploit new features.
There was the following problem, however. For the most part 31-bit mode
was not very different, so some programmers were lulled into a false
sense of security and thought they could simply run in 31-bit mode.
Those are the programs that broke when subtle differences became fatal.
Technically, the architecture supports both 24-bit (almost 360-compatible)
and 31-bit mode, with user-level instructions to switch mode. The default
settings for loaders are such that programs which don't declare any special
properties will run in 24-bit mode, so those dusty old object decks will
still run. Modern compilers (and assembler programmers) flag their code
as 31-bit, of course.
The newly-announced z-series (64-bit S/390) adds a third mode (24-31-64),
and old 24-bit programs are *still* supported. Current 31-bit programs
will run just fine on z-machines. Those that know better can use new
instructions to access the full 64-bit registers, or enter 64-bit mode
and explore as large an address space as the control program will give
them; those that don't will not gain from the new capabilities, but they
will not lose anything either. THAT's compatibility.
Michel.
>
> OK, let's play this request out: suppose that general purpose computer
> manufacturer A decides to put in two extra bits besides the address....
[long description elided]
> The moral of this is that whatever we ask of hardware vendors had better
> be general-purpose enough to be useful in any language, or else it will
> eventually fail.
I don't buy this line of reasoning. There are far too many `unusual'
machines that are commercially viable for this to be the case (think
about some of the IBM architectures).
Even `standard' computer architectures have their quirks: In many
processors there are constraints on the use of the topmost address
bits, and in many standard architectures there are extra bits on each
word for ECC (all that wasted RAM).
Of course, as it stands right now, new hardware that doesn't
faithfully mimic the Pentium instruction set at better than about
500MHz would be at an extreme disadvantage.
> In article <4aea2i...@beta.franz.com>,
> Duane Rettig <du...@franz.com> wrote:
> >chu...@best.com (Chuck Fry) writes:
> >
> >> And with 64 bits to play with, who's to say we can't spare a couple for
> >> CDR-coding?
> >
> >Let's not make this mistake again. When IBM (and Amdahl) created their XA
> >architectures for the 370 compatibles (extending the address range from 24
> >bits to 32 bits), several programs suddenly broke. These programs, which
> >used the upper 8 bits of address words for tags and flags, assumed (correctly,
> >but without foresight) that these bits were available for use. However,
> >whether these programs were justified in the original designs or not, it
> >would be safe for me to venture a guess that _none_ of them exist in their
> >previous design form, and that all have been either rewritten or scrapped.
> >[snip]
>
> A few of those 24-bit programs still exist, and they still run! Only those
> programs that wanted to exploit the wonders of 31-bit mode were exposed, and
> indeed those who coded without foresight were doomed to continue to play in
> their little 24-bit corner. But they didn't break -- IBM takes backwards
> compatibility seriously (for user-level programs, that is). They simply
> couldn't automatically exploit new features.
I stand corrected, if there are 24-bit programs out there. (and I also
stand corrected for talking about 32 bits instead of 31 bits - forgot
about that little difference :-) I am also all for backward
compatibility. However, I can say with certainty (almost by definition)
that the only reason why those 24-bit programs could stay 24-bit was
because they were not _forced_ to go to 31 bits. This may seem like
circular reasoning, but it provides the preparation to point out
something you missed below:
> There was the following problem, however. For the most part 31-bit mode
> was not very different, so some programmers were lulled into a false
> sense of security and thought they could simply run in 31-bit mode.
> Those are the programs that broke when subtle differences became fatal.
What you call a subtle difference is certainly not so subtle. One
criticism that Lisp as a language has had to bear is its tendency to
be large (though nowadays it tends to be one of the smaller of
languages, comparatively). However, the very fact that lisp is able
to handle large problem spaces well has always allowed it to be used
for increasingly larger projects. Currently, we have customers,
especially in the CAD realm, whose data spaces push beyond the normal
2 to 3 Gb address spaces allowed by 32 bit systems, thus causing a
need to move toward usage of 64-bit addressing. Certainly it was
long ago that these problem sets broke through the now seemingly
pitiful virtual address range of 16 Mb, which is what 24 bits
of addressing provides.
So it was not necessarily a false sense of security in an optional
move, but instead a non-scalable design in an ever-growing need for
space, which caused the need for at least some programs to move to
31-bit mode, and which thus broke. I suspect that there were at
least some programs besides Lisps which broke in this way, but I
don't myself know that.
> Duane Rettig <du...@franz.com> writes:
>
> >
> > OK, let's play this request out: suppose that general purpose computer
> > manufacturer A decides to put in two extra bits besides the address....
>
> [long description elided]
>
> > The moral of this is that whatever we ask of hardware vendors had better
> > be general-purpose enough to be useful in any language, or else it will
> > eventually fail.
>
> I don't buy this line of reasoning. There are far too many `unusual'
> machines that are commercially viable for this to be the case (think
> about some of the IBM architectures).
Which machines, and how so `unusual'? Are you really still talking
about general-purpose machines?
> Even `standard' computer architectures have their quirks: In many
> processors there are constraints on the use of the topmost address
> bits, and in many standard architectures there are extra bits on each
> word for ECC (all that wasted RAM).
True enough, but ECC and parity tend to be transparent to the user
level programmer. They aren't "seen".
As for segmented architectures, even HP's PA-RISC, which forced the
assembler programmer to load segment registers and perform external
branches in their 32-bit architecture - they fixed their instruction
set to allow continuous addressing in their 64-bit architecture.
Now it is also transparent to the user-level programmer.
> Of course, as it stands right now, new hardware that doesn't
> faithfully mimic the Pentium instruction set at better than about
> 500MHz would be at an extreme disadvantage.
Well, as a software vendor to many different architectures, I disagree,
but that is really another discussion.
But if push comes to pop, then it is probably a stack :-) (I'm sorry,
I couldn't resist. I'll go back in my hole now.)
--
- Stephen Fuld
Unfortunately, your guess would be wrong! In MVS and its successors, there
is still the concept of below/above the line, where the line is the address
(16MB) that fits within 24 bits. Some code and data structures must still
run "below the line". It is truely amazing how long a bad decision can
haunt one :-(.
--
- Stephen Fuld
Snip
At least on x86, this is much less costly than it would seem:
if (x & SIGNIFICANT_BITS)
takes exactly (modulo slightly increased opcode size) the same time as
if (x)
The relevant asm code would be:
test eax,eax
jnz skip
vs
test eax,SIGNIFICANT_BITS ; 32-bit immediate
jnz skip
On other architectures you would probably (semi-)permanently allocate a
register to contain that 'SIGNIFICANT_BITS' mask, which does carry some
cost.
> Duane Rettig wrote:
> [huge snip, lots of interesting stuff about CDR vs EQ]
> > I believe that EQ testing does speak directly to the point of cdr-coding;
> > if you measure the number of times CDR is taken vs the number of times
> > EQ is performed in a typical program, you find that EQ usage overwhelms
> > CDR usage. Thus, there is a strong impetus to optimize EQ testing
> > in preference to CDR operations.
> >
> > Remember that any boolean test, e.g. "(when x ...)" is equivalent to
> > "(unless (eq x nil) ...)", in a similar manner that in C "if (x) {...}"
> > is equivalent to "if (x!=0) {...}", so there are many more EQ tests than
> > meets the eye in lisp code. If you use a bit which changes the value
> > of a pointer without changing its object's identity, then tests for
> > identity can no longer be performed in one instruction, and lisp then
> > loses to other languages...
>
> At least on x86, this is much less costly than it would seem:
>
> if (x & SIGNIFICANT_BITS)
>
> takes exactly (modulo slightly increased opcode size) the same time as
Not a real issue, but can't resist this: 5 bytes instead of 2 bytes
is slight? :-)
> if (x)
>
> The relevant asm code would be:
>
> test eax,eax
> jnz skip
>
> vs
>
> test eax,SIGNIFICANT_BITS ; 32-bit immediate
> jnz skip
This magic works only when you are testing against integer 0. C
happens to pun zero as the null or false value, but in Common Lisp,
zero is a non-null integer and NIL is the false value (Scheme is even
more "pure", with nil being the empty list and a special false value,
both distinct from 0).
So such a test would necessarily require mask and compare instructions,
and could not use test.
Our lisp implementation keeps NIL in edi, and tests against it with
a long compare instruction, which is definitely faster than a mask
followed by a compare.
My second paragraph was intended as a reminder to lisp programmers
in case they forget how many times they do EQ comparisons. However,
EQ is a general comparison test, and is the most primitive operation
in lisp. It is in effect, a single instruction to compare between
to longwords (besides, of course, the jump instruction, although some
architectures have compare-and-jump instructions, thus eliding the final
jump at the end). It is the general comparison "instruction" that would
have to be enhanced to first mask off any bits that are "noise" with
respect to what lispers know to be two identical lisp objects. It is
this enhancement required by cdr-coding that I object to.
Yes, as long as this isn't the only instruction used. I.e. even if 10%
of all code was test for C NULL values, it would only increase the total
code size by 7.5%.
> > test eax,SIGNIFICANT_BITS ; 32-bit immediate
> > jnz skip
>
> This magic works only when you are testing against integer 0. C
> happens to pun zero as the null or false value, but in Common Lisp,
> zero is a non-null integer and NIL is the false value (Scheme is even
> more "pure", with nil being the empty list and a special false value,
> both distinct from 0).
>
> So such a test would necessarily require mask and compare instructions,
> and could not use test.
OK, this is much more serious!
Can anyone come up with a plausible program in the next two decades, to
run on conventional architectures, that will require an address space
between 60 and 64 bits? If you say "let's map everything into a flat
space" you will need more than 64 bits. If you stick to conventional
addressing, including memory mapping individual objects, you use a lot
less than 64 bits.
--
John Carr (j...@mit.edu)
> Can anyone come up with a plausible program in the next two decades, to
> run on conventional architectures, that will require an address space
> between 60 and 64 bits? If you say "let's map everything into a flat
> space" you will need more than 64 bits. If you stick to conventional
> addressing, including memory mapping individual objects, you use a lot
> less than 64 bits.
18 446 744 073 709 551 616 bits
2 305 843 009 213 693 952 bytes
2 147 483 648 gigabytes
Today, a "full" Usenet feed is around 250 GB/day, and the yearly
growth is around 200%. With two decades of continued growth at the
same rate, that'll be 250*3^20 gigabytes, or:
871 696 100 250 gigabytes
So the answer is, yes, there is a plausible program that will require
far more, if it is expected to use a system with flat addressing.
Since EMC delivers systems with storage capacity well into the
terabyte range today (enough for storing a full Usenet feed for well
over a week, perhaps as much as a month, I haven't checked their
products lately), it doesn't seem unlikely that they will have
something that can store more than 2 billion gigabytes (or 2 exabytes)
in 2020. It isn't as if it's _that_ much data. Perhaps, if you think
of it on a block level, and let each block be 16 KB, then you'll sort
of postpone the necessity of moving beyond 64 bit for a while.
Assumptions:
- Usenet continues to grow exponentially at approximately the same
rate (the growth rate has been slightly more than exponential for
the past 7 years or so, but was higher again before 1993)
- Some people will still want to store all of Usenet for more than
one day (i.e. a week or so)
- Storage technology can keep up with this growth
- Network technology can keep up
- Other computer technology can keep up
Sources:
- Comparative studies in my own ongoing thesis on Usenet
distribution
- Henry E. Hardy's "The Usenet System", ITCA Yearbook, 1993
- Jeremy Nixon's old news statistics on SuperNews (now unavailable)
- http://newsfeed.mesh.ad.jp/flow/
--
In the beginning was the Bit, and the Bit was Zero. Then Someone
said, Let there be One, and there was One. And Someone blessed them,
and Someone said unto them, Be fruitful, and multiply, and replenish
the Word and subdue it: and have dominion over every thing that is.
And Microsoft made the same mistake with their Basic, *twice* on the Macintosh
and then *again* on the Amiga.
I don't believe usenet will grow a million times over the
next 20 years or be mapped into a flat address space.
There's a limit to the amount of data a human can read or
write, in any medium.
When the company I worked for was bought the new owners told us
to expect their stock to continue growing at the rate of 80% per
year for many years to come. A few moments of thought made it
clear that the stock value would exceed the US GNP, or even the
world GNP, in only a few years. (We've actually lost about 30%
over 7 months.)
--
John Carr (j...@mit.edu)
> I don't believe usenet will grow a million times over the
> next 20 years or be mapped into a flat address space.
> There's a limit to the amount of data a human can read or
> write, in any medium.
You clearly don't know what Usenet is being used for these days. If
you thought it was plain text articles like those we're blessed with
seeing in comp.arch, you're dead wrong.
Have a looksie at alt.binaries, and come back when you've uploaded a
handful of CD images, like so many others seem to do.
Then watch out for those DVD images.
In the future, expect other storage formats to be uuencoded or
base64-coded just as successfully as today's formats are.
> When the company I worked for was bought the new owners told us
> to expect their stock to continue growing at the rate of 80% per
> year for many years to come. A few moments of thought made it
> clear that the stock value would exceed the US GNP, or even the
> world GNP, in only a few years. (We've actually lost about 30%
> over 7 months.)
This isn't stock. Usenet feeds will keep on growing until technology
can't let it grow anymore; the users will see to that.
(BTW, the text-only feed for Usenet is probably far less than 5 GB
today, and it isn't growing nearly as fast as the full feed.)
> Today, a "full" Usenet feed is around 250 GB/day, and the yearly
> growth is around 200%. With two decades of continued growth at the
> same rate, that'll be 250*3^20 gigabytes, or:
>
> 871 696 100 250 gigabytes
lets say that there is currently 6billion people ... then 250gb/day
amounts to about 40bytes/person/day. say in 20 years there are twice
the number of people or 12 billion people on the planet ... then the
871 696 100 250 gigabytes ... works out to around 7-8 million bytes
per person/day.
already the majority of the bytes are not text ... but binary of one
form or another. lets say that text is only 10% of the total ... and
something is worked out to handle the binary in some other way. That
would still say in 20 years that every person on the planet would have
to be generating around 750,000 text characters per day; say 6
characters per word ... around 120,000 words/day ... say 300 words/page
... around 400 pages/person/day.
majority of the usenet growth has been binary with some increase due
to more people involved. however, once every person in the world is
spending all their time only generating material for usenet ... i
would think that the growth would start to level off.
--
Anne & Lynn Wheeler | ly...@garlic.com - http://www.garlic.com/~lynn/
> Mathematica = ?? written in its own list-based programming language ??
IIRC what Mathematica calls lists are really arrays. I would expect its
(lack of) performance to be more related to the weird evaluation model.
Hannu Rummukainen
It won't take two decades.
This kind of thinking comes dangerously close to the "it'll never happen"
mantra that intelligent programmers and engineers have been (incorrectly)
reciting for at least the last 6 decades.
The mantra comes in two forms: "it can't be done", and "you'll never
use it all".
The first form is probably the most familiar, and usually occurs when
engineers see a wall in the technology they work with, past which the
technology doesn't work. It happened with vacuum tubes, magnetic core
memory, transistors, and even various forms of integrated circuits.
It has also happened in the mechanical world; I am amazed myself to see
gears and motors only a few molecules across. Are we at the limit there?
I wouldn't count on it...
The second form is more fun, because instead of picturing limits on
one's technology, we are picturing limits on one's own imagination.
My favorite example was in college, when I was in the university
computer lab, and talking to the resident guru about wanting to wait
to buy my own computer because I wanted one of those new machines with
64 kbytes of memory. His response was "what do you want with one of
those? You'll never use all that memory". Did I know myself precisely
what I would do with all that memory? No, but I knew I would find a
use...
> Jan Ingvoldstad <ja...@ifi.uio.no> writes:
>
> > Today, a "full" Usenet feed is around 250 GB/day, and the yearly
> > growth is around 200%. With two decades of continued growth at the
> > same rate, that'll be 250*3^20 gigabytes, or:
> >
> > 871 696 100 250 gigabytes
>
> lets say that there is currently 6billion people ... then 250gb/day
> amounts to about 40bytes/person/day. say in 20 years there are twice
> the number of people or 12 billion people on the planet ... then the
> 871 696 100 250 gigabytes ... works out to around 7-8 million bytes
> per person/day.
>
> already the majority of the bytes are not text ... but binary of one
> form or another. lets say that text is only 10% of the total ... and
==============================================^^
> something is worked out to handle the binary in some other way. That
> would still say in 20 years that every person on the planet would have
> to be generating around 750,000 text characters per day; say 6
> characters per word ... around 120,000 words/day ... say 300 words/page
> ... around 400 pages/person/day.
Why 10%? Why not 1%, or .00001%? How many characters in a picture?
Or a movie, or a symphony, or...
My boss' wife just had a baby. Almost immediately, he had four pictures
available for viewing. Very few words, the pictures said it all.
Much less than 10% text, I would say.
> majority of the usenet growth has been binary with some increase due
> to more people involved. however, once every person in the world is
> spending all their time only generating material for usenet ... i
> would think that the growth would start to level off.
The assumption that people will spend all of their time generating
material for usenet is based on a presumption that the rate of
material generation itself will remain constant. A dubious
presumption, I think.
There is a limit regardless of medium -- even video can't usefully go
over tens of megabytes per second. But let's assume that usenet will
grow 2x per year forever, and the world will come to an end if a day of
usenet can't fit into memory. (Or a week, month, or all of time.)
Continued exponential growth actually makes my case stronger. Because
no number of bits will be adequate in the long term, reducing the
effective address space by 1 bit only advances the transition to a
bigger address by one doubling period.
--
John Carr (j...@mit.edu)
> In article <oasr93d...@skidbladnir.ifi.uio.no>,
> Jan Ingvoldstad <ja...@ifi.uio.no> wrote:
> >On 12 Dec 2000 16:21:12 GMT, j...@mit.edu (John F Carr) said:
> >
> >> I don't believe usenet will grow a million times over the
> >> next 20 years or be mapped into a flat address space.
> >> There's a limit to the amount of data a human can read or
> >> write, in any medium.
You are making the unwarranted assumption that address space is only
used for addressing memory. There are a number of good reasons that
you might want 64 address bits, but not use (expt 2 64) bytes.
Meanwhile, there's another limit that might be harder to deal with:
250 GB/day is about 23 megabits/sec average, i.e. half a t-3 for the
news backbone. The expanded version of usenet will require about
80 exabits/sec for the backbone; to give you a rough idea of what
that means, using straightforward modulation techniques that would
be about 10 signal transitions per cycle of yellow light.
I expect that doesn't make it impossible; instead, if anything, it
makes it likely that Usenet-N will be some kind of horrific distributed
database that absolutely requires a flat address space to work...
I was rather struck by someone else's comparison to ECC, which is
transparent to user-level programs. What kind of smart memory
architecture would you want so that CDR-coding was similary
transparent, and would you want it? (Would it be just one of a
host of possibilities for memory subsystems that didn't store your
data as written but simply promised to return that value when you
asked for it?)
paul
Why should the data be human written? (current examples:
satellite weather pictures, executables)
Or for that matter human read? (current examples: mirror engines,
executables again)
Arbitrarily large bandwidth will have arbitrarily astounding (and
unpredictable) impact.
> >
> >You clearly don't know what Usenet is being used for these days. If
> >you thought it was plain text articles like those we're blessed with
> >seeing in comp.arch, you're dead wrong.
>
> There is a limit regardless of medium -- even video can't usefully go
> over tens of megabytes per second.
You obviously don't play video intensive computer games :->
Even at 60 fps (still discernable flicker) at 1280x1024x24 bit
you'll be nudging 200 MB/sec ... and that's still way below
(e.g.) IMAX film bandwidth requirements for real-time playback.
<snip>
--
Russell Crook
Not speaking officially for Sun (or anyone else, for that matter).
[about the 70 MB harddisk of the MicroVAX]
"Why waste a huge disk on a single user Workstation?"
- Ken Olson, ex-President of ex-company Digital Equipment Corporation
>You obviously don't play video intensive computer games :->
>Even at 60 fps (still discernable flicker) at 1280x1024x24 bit
>you'll be nudging 200 MB/sec ... and that's still way below
>(e.g.) IMAX film bandwidth requirements for real-time playback.
Video compresses a lot. To within a factor of a few, I see the maximum
useful video rate for human viewing at 10 megapixels times 1 bit per
pixel per frame times 100 fps, or tens of MB/sec.
--
John Carr (j...@mit.edu)
I expect that only some headers will be passed around very shortly,
with really clever cache propagation protocols that makes clients
retrieve an article by message-id on demand, ultimately from the
originating server, cutting traffic and disk requirements down to what
is actually used, and killing half the stupid spam problem, too. News
traffic will follow the readers, not always flood the whole Net. And
since we're all fully connected (in N months, anyway), the off-line
generation of news readers and the desire to keep a full copy of
today's cache/catch of USENET locally will simply go away. Nobody in
their right mind has a desire to keep a "local copy" of the whole
World Wide Web just to read an infinitesimally small fragment of it,
yet that is what people do with USENET because getting out of how it
has been done up to now is hard to get away from, but it really is no
worse than some large ISP doing it the right way. The bonus of such a
design is that most originating servers would be able to hold on to an
article _much_ longer than the receiving servers of today are, and
that, too, would relieve most people of the need to keep local copies
of everything.
#:Erik
--
"When you are having a bad day and it seems like everybody is trying
to piss you off, remember that it takes 42 muscles to produce a
frown, but only 4 muscles to work the trigger of a good sniper rifle."
-- Unknown
> Lieven Marchand <m...@bewoner.dma.be> writes:
>
> > It was very nice to have eight bits to work with
> > with each pointer. Perhaps architecture designers should reconsider
> > giving user programs some bits in the address space.
>
> Now, in order to use those bits, they must be available per word, so
> they must be included in the data path and memory. Thus, each data
> word now has 64 bits for regular data plus these two extra bits for
> L to use.
>
> Now, C vendor (C, of course) and Java vendor J are getting on the
> bandwagon of the newly popular A machines, but they notice that they
> are not getting all they could be getting out of the architecture;
> they notice that there are 66 bits of address width, 66 bits in the
> data path, and a disconnect between the data and address in two of
> the bits, and thus the address space is 1/4 the size that it could
> be if those bits could be made use of. So C and L complain to A.
>
> The A developers then set to work on the new A' (or should we call
> it XA? :-) architecture which includes full functionality for the 65th
> and 66th bit, it comes out and leaves L to reimplement its lisp in more
> general terms or else be lost in the dust.
>
> The moral of this is that whatever we ask of hardware vendors had better
> be general-purpose enough to be useful in any language, or else it will
> eventually fail.
Maybe. Would A really do a new architecture to add 2 bits? They would
probably go for 128 data bits and 4 extra bits. A lot of architectures
have some hidden bits associated with each page entry in the paging
table for operating systems to use. Those haven't been subjected to
the kind of pressure from vendors like you describe either, although
some OS'es use them and some don't. Likewise for security rings. I
don't know offhand of an OS in popular use on Intel that uses all four
rings, but they're not going away either.
--
Lieven Marchand <m...@bewoner.dma.be>
Lambda calculus - Call us a mad club
Oooh, it's a trick question! I kno the answer!
"And CDR-coding is one of those reasons."
:->
That's only 5-6 frames per day at a modest 1280x1024 resolution.
(http://www.eecg.toronto.edu/~mann/)
> Duane Rettig <du...@franz.com> writes:
>
> > The moral of this is that whatever we ask of hardware vendors had better
> > be general-purpose enough to be useful in any language, or else it will
> > eventually fail.
>
> Maybe. Would A really do a new architecture to add 2 bits?
Of course not. My whole straw proposal was an illustration to show
how it is dangerous to ask general-purpose hardware manufactureres
for those features that are not likely to be used by other langauges,
but it does not speak to the whole problem of convincing them to do
the job in the first place...
> They would
> probably go for 128 data bits and 4 extra bits. A lot of architectures
> have some hidden bits associated with each page entry in the paging
> table for operating systems to use. Those haven't been subjected to
> the kind of pressure from vendors like you describe either, although
> some OS'es use them and some don't. Likewise for security rings. I
> don't know offhand of an OS in popular use on Intel that uses all four
> rings, but they're not going away either.
Correct. Hardware designers will put in things that they feel are
important.
However, sometimes something gets in that benefits lisp and similar
languages: take the taddcc instruction on the Sparc, for example
(and this is a perfect example of how something like this eventually
fails) - The taddcc instruction is perfect for those software
architectures which view a small integer (e.g., "fixnum" in lisp) as
a signed integer shifted left two bits where the least significant bits
are 0. The taddcc/tsubcc instructions make addition and subtraction
of these numbers very efficient on a 32-bit lisp, and allow overflows
or non-fixnums to be software-trapped to a more generic add/subtract
function.
However, take a look at what has happened:
1. A more recent Sparc architecture release (either V9 or V10, I
forget which) has deprecated taddcc/tsubcc. This is presumably
because there was not enough use for it (though the lisp community,
at least, use it).
2. When we go to the 64-bit architecture, two changes occur:
a. The best representation for a fixnum is a shifted-by-three-bits
number, with three ls-bits 0.
b. Overflow occurs into the sign bit of a 64-bit machine integer,
rather than into the sign bit of a 32-bit machine integer.
The taddcc/tsubcc instructions look at two ls-bits, and check
overflow into the 32nd bit. There are no equivalent instructions
which could be considered "64-bit txxxcc", which would check the
3 ls-bits and a 64-bit signed overflow.
So we have a case where a used-(mostly/only)-by-lisp feature on GP
hardware will eventually go away.
I still believe that the wisest thing that the lisp community can ask
of hardware and operating system vendors is fast user-level trap
capability; with fewer registers saved and fewer cycles between the
trap and the return from the trap. Such a feature would be much more
likely to be widely used than cdr-coding, even in lisp systems.
>I still believe that the wisest thing that the lisp community can ask
>of hardware and operating system vendors is fast user-level trap
>capability; with fewer registers saved and fewer cycles between the
>trap and the return from the trap. Such a feature would be much more
>likely to be widely used than cdr-coding, even in lisp systems.
I've wished for that feature several times over the years,
and I rarely use lisp.
Linux microbenchmarks have at least made people pay a little
attention to trap performance. ("My cheap Linux system does
something useless 10 times faster than your expensive commercial
Unix system.") So far I think the effects have been all in
software.
--
John Carr (j...@mit.edu)
>I still believe that the wisest thing that the lisp community can ask
>of hardware and operating system vendors is fast user-level trap
>capability; with fewer registers saved and fewer cycles between the
>trap and the return from the trap. Such a feature would be much more
>likely to be widely used than cdr-coding, even in lisp systems.
>
I will admit I haven't thought this through, but it seems like you
are almost asking for the ValueCall (VALC) operation on the Unisys
A-series (aka Clearpath NX). The ValueCall opcode will load the value from a
memory location, if the memory location is a value, your done, if it is a
pointer to data, it dereferences (loads the data pointed to) and if it is a
pointer to code, it calls it, expecting the code to return the relevant data
on the stack (A-series is a stack machine).
And who said Call-by-Name was useless? :)
- Tim McCaffrey
Out of curiosity: what kinds of large commercial
applications use LISP? Do they use it as an embedded
scripting engine, or to write the application proper?
I guess you could ask the same question about functional
languages in general. All I am aware about is Ericcson's
use of Erlang. Phil Wadler has a web page talking
about some others:
http://www.cs.bell-labs.com/who/wadler/realworld/
Manoj
> In comp.arch Duane Rettig <du...@franz.com> wrote:
> > languages, comparatively). However, the very fact that lisp is able
> > to handle large problem spaces well has always allowed it to be used
> > for increasingly larger projects. Currently, we have customers,
> > especially in the CAD realm, whose data spaces push beyond the normal
> > 2 to 3 Gb address spaces allowed by 32 bit systems, thus causing a
> > need to move toward usage of 64-bit addressing. Certainly it was
>
> Out of curiosity: what kinds of large commercial
> applications use LISP? Do they use it as an embedded
> scripting engine, or to write the application proper?
Both are possible, as well as using lisp as glue to hold
together systems with subsystems written in different languages.
However, the applications I refer to, which _must_ use lisp for
its ability to handle large problem sets, tend to be largely
written in lisp.
> I guess you could ask the same question about functional
> languages in general. All I am aware about is Ericcson's
> use of Erlang. Phil Wadler has a web page talking
> about some others:
> http://www.cs.bell-labs.com/who/wadler/realworld/
Looks like a good list, and I notice that Lisp is a major
contributor in that list. However, lisp is not a purely
functional language, per se; I would characterize it as very
general purpose, with good support for functional, object-oriented,
dynamic, etc. programming.
You can also check out our website (below) under "Success Stories",
and also http://www.lisp.org, for more info.
Touche'. However, I would call it Call-by-deReference :-)
I don't know that architecture, but you have the right idea. Some
of the features that would have to be present in order for such an
operation to be useful to us are:
- The memory address would have to be passed into the code when
called, so that new locations for the forwarded data could be
calculated.
- The code would need to also be called in a store operation.
Whether the opration is read or write would have to be passed to
the code or else calculable.
- Access to simple data would have to be as fast as a simple load
instruction.
This implementation seems to require a lot of hardware to implement
(again, I don't know this architecture, but am guessing), and I'd also
guess that there are a couple of extra bits involved, there, to
determine whether the other bits are data, indirect, or code.
A more general purpose and portable read/write barrier would involve
setting memory page attributes in the MMU to "no access" and when the
location is read from or written to, the appropriate actions are
taken by the garbage collector's trap handler to perform the proper
forwarding and accessing of the forwarded object. This would require
a very fast trap handler, at the same order as a first level page
fault handler, perhaps partially implemented in hardware.
> Arrays are also extremely inefficient
> where lists are not the wrong data structure.
Yes - but I think that is rare.
> Hence, use lists _and_ arrays, as appropriate.
Of course.
> Lisp programmers are not the idiots you think they are, Per. They
> figure out when it makes sense to use arrays even if you don't and
> need to reduce the problem to silliness.
Did I accuse anyone of being idiots? (I leave that to you.) Even
when arrays are *in principle* a better data structure, it may be
perfectly reasonable to use lists. For example it may be easier in
Lisp to program with lists than array. While Lisps do support arrays,
they are traditionally based on lists as the fundamental data
structure (for example &rest binds to a list, not an array), so it
just might be easier to use lists, even if they are less efficient.
Note the context of this discussion: Is cdr-coding a good idea?
My answer is that it might be a good useful for existing code
that uses lists heavily. However, it is reasonable to believe that
any code that would gain signficantly from cdr-coding would gain
as much or more from using arrays instead of lists. This
takes away much of the point of implementing cdr-coding.
> If your data structures nest, use lists.
Or nested arrays. Or you can use more clever data structures, such
as the "document table model" that the Xalan XSL processor uses
to represent XML documents.
> If your data structures have unpredictable structure, use lists.
If your data structures have unpredictable structure, you may be
in trouble ...
> If you need to maintain uniformity of type for the "rest" after
> processing an initial portion of the data structure, use lists.
Or use a more abstract sequence model, which would have arrays and
sub-ranges of arrsys *and* lists as special cases.
Or use a pair of an array and an integer.
> (If you think this only applies to recursion, which it seems you do, you
> really must think more carefully about algorithm design with arrays
> and why displaced arrays or passing _pairs_ (sorry, data structures in
> small arrays, of course) of the array and the index won't cut it.)
This is too vague for me to answer.
> If you need to manipulate the structure of the data structure in any
> way, not just insert/delete, use lists. (Flattening, pairing, etc.)
It is of course quite reasonable and efficient to do flattening
into an array.
> If after much headache and painfully idiotic programming mistakes, you
> discover that your array implementation really must use two elements
> per "array", one data and one "next", you have reinvented the cons.
> Congratulations! Welcome to the wisdom of 40 years of language design!
Erik, you know that I am older than you, and have probably been
programming at least as long as you. Don't you imagine I might have
quite a bit of experience with both arrays *and* lists? (I have less
experience with Common Lisp than you, I'm sure though.) I think my
share of "painfully idiotic programming mistakes" is modest, yet I
have not found that my "implementation really must use two elements
per "array"".
> Who cares about allocating separate cons cells? Simple arrays have to
> be more than four elements long to take less space than the equivalent
> list using cons cells.
Assume a modest one word of overhead per object. Then an extensible
array with three elements takes 3 words for the array header (header +
current length + pointer to data array) plus 4 words for the data
array (1 word of header plus three words of data) or 7 words total. A
non-extensible array would only need 4 words. If you need a separate
header word for the array allocated length, add one more word yielding
8 or 5 words, respectively. A list of three cons cells take 9 words.
Already for length three, the array wins.
If you use a memory allocation scheme not requiring any header words,
then the extensible 3-element array takes 6 words, the non-extensible
array takes 4 words, and the list version takes 6 words. The array
is equal to the list in space.
> The allocation for cons cells is extremely efficiently implemented in
> all Lisp implementations.
Unless you use cdr-coding, you will need 2 pointers plus any per-object
overhead (which may be zero) for each list element. This is twice as
much space as using an array. This only wins for short lists, or
for extensible size arrays whose size are just past the doubling threshhold.
> Using your stupid arrays-for-lists would of
> course force some super-clever implementation of exponent-of-2-sized
> arrays that double in size as they are used _and_ move around all the
> time (destroying object identity unless you waste even _more_ space on
> their already inefficient implementation),
One obvious way to solve the problem of unused space in extensible arrays
(while will still almost always takes less spaces than using lists
expect for very short sequences) is to have the garbage collector trim
off the excess space. (You double the space allocation to deal with
arrays that are actively growling, to prevent quadratic effects.
If they have survived to garbage collection, they are probably not
actively growing.)
An alternative to doubling the size may be to use the Fibonacci sequence
for the arrays sizees: 3 words, 5 words, 8 words, etc. This reduces the
maximum wastes space waste to asymptoticaly about 38%. I think this
preserves linear complexity (i.e. each array element has been copied
an *average* of a small constant number of times), but I'm not sure
enough of my math to say for sure. Does anyone know? (Knuth mentions
Fibonacci memory allocation, but I don't know if he says anything
relevant to this problem.)
> but people have been there,
> many times over, and they have _returned_ to lists, implemented via
> cons cells, when that has been the most intelligent solution.
Because it is convenient, not necessarily because it is efficient.
> You've done so much good work. How come you don't know simple things?
Likewise, I'm sure.
--
--Per Bothner
p...@bothner.com http://www.bothner.com/~per/
> Meanwhile, there's another limit that might be harder to deal with:
> 250 GB/day is about 23 megabits/sec average, i.e. half a t-3 for the
> news backbone. The expanded version of usenet will require about
> 80 exabits/sec for the backbone;
Eh, no, 811 exabytes/day is not 80 exabits/sec, but roughtly 80
petabits/sec, which is much easier to handle. ;)
> to give you a rough idea of what
> that means, using straightforward modulation techniques that would
> be about 10 signal transitions per cycle of yellow light.
Does this mean that it's only (roughly) 0.01 signal transitions per
cycle of yellow light, or is that what you get for petabits?
But still, the time for this problem to arise might still come.
Usenet news isn't the only thing in the world requiring larger amounts
of storage and bandwidth. Video-on-demand has (so far) largely been a
failure for the mass market because of the lack of bandwidth for the
necessary quality (although MPEG-4 and later MPEG-7 may make subtle
changes to this), and sometime, people may actually have those nice
HDTV (or better) displays in their homes. So yes, I think we'll reach
that limit one day. Maybe it won't be in 20 years, but I'll be around
when it happens.
|> I still believe that the wisest thing that the lisp community can ask
|> of hardware and operating system vendors is fast user-level trap
|> capability; with fewer registers saved and fewer cycles between the
|> trap and the return from the trap. Such a feature would be much more
|> likely to be widely used than cdr-coding, even in lisp systems.
Oh Boy, would I like to second that one.
Not just the Lisp community, but other (less well known)
language areas too. I do a lot of work in persistent
systems, and we love to trap pointer deferences for all
sorts of reasons. Most systems do this in software, but
some try using traps, either through VM level protection
or, on some, systems illegal addresses (in particlar systems
like the Alpha where stuffing a 1 into the LSB guarantees
a trap.)
The big killer in any trap based system is the enourmous
cost of making the trap traverse the OS. Some of the
earlier Sparc designs were quite brutally expensive.
The MIPS designs could take user level traps, but no OS
allowed access to the the feature.
Lastly I think CDR coding, cute as it is misses a more useful
use for a tag bit. Differentiating pointers from scalar
data. If you are designing any object based language system
you need to collect garbage. Being able to find pointers
really quickly is very useful. Those languages that do not
provide for a cannonical object format (Java being the most
recent offender) would really benefit. Of course changes
to the laguage spec can eliminate this problem. Won't happen
of course. If Java becomes the force some pundits predict
maybe we will see some real hardware support - but since
the language support system is mostly unable to make use
of such support in its current spec, we have a chicken
and egg problem.
However, once you have one tag bit, might as well have them all.
Francis Vaughan
Really? Reading Lisp data from input streams is well-defined today,
despite an absence of predictability of the objects you read in. It
works for both lists and arrays, but arrays are very inefficient
because you don't know their size a priori, and copying elements like
crazy as the list grows is much less efficient than grabbing cons
cells from a pool of them. The way Lisp readers usually deal with
arrays is in fact to build a list from the input and then convert the
list to an array. You could process the data twice, once to count
elements and once to actually store them in memory. You could rely on
someone else not lying about the number of elements in the array,
which would make the data format impossible to edit by hand. Etc.
I recommend programmers in non-list-supporting languages to implement
the lists in the protocol syntaxes I have defined as arrays for one
good reason: Such languages seldom have automatic memory management,
either, so abandoning a reference causes memory leaks. If they keep
track of the array, they usually avoid the issue of keeping track of
all the individual elements. I have been kind to them by specifying
that no list will be longer than 32 elements, so allocating all arrays
that big remains a possibility. Much of the implementor feedback has
been the difficulty in reading lists of objects of unpredictable types
and my recommendation not to convert to an internal type until you
need it, but instead keep the string form with a prefix character (the
equivalent of a type tag) have resulted in some pretty weird code.
| Erik, you know that I am older than you, and have probably been
| programming at least as long as you. Don't you imagine I might have
| quite a bit of experience with both arrays *and* lists? (I have less
| experience with Common Lisp than you, I'm sure though.) I think my
| share of "painfully idiotic programming mistakes" is modest, yet I
| have not found that my "implementation really must use two elements
| per "array"".
You recommend arrays to other people, not just to yourself, unless I'm
_gravely_ mistaken about the purpose of your article. Other people
who have much less experience than you and rely on your experience in
order to form their own, with the least possible pain. My experience
with people who try to implement things with arrays is that they end
up with very cons-like concepts once they outgrow the C "memory is
really an array of bytes" idea. The reason is simple: Arrays with the
properties we discuss are very, very hard to implement correctly.
If we _are_ talking about what a compiler designer would do for a
language that natively supports these things in such a way that
programmers would never _see_ the array, the kind of reasoning you
have provided is completely misplaced. I have therefore assumed that
you do _not_ argue about primarily what a compiler writer would do,
although CDR-coding _is_ at that level.
| Assume a modest one word of overhead per object. Then an extensible
| array with three elements takes 3 words for the array header (header +
| current length + pointer to data array) plus 4 words for the data
| array (1 word of header plus three words of data) or 7 words total. A
| non-extensible array would only need 4 words. If you need a separate
| header word for the array allocated length, add one more word yielding
| 8 or 5 words, respectively. A list of three cons cells take 9 words.
| Already for length three, the array wins.
I think you need to explain how you ended up with 9 words for the
list. There is of course no header word for cons cells if you are
going to implement a very basic idea in a language with this. Type
information is stored much more efficiently if you care about these
things, or you allocate them in blocks that only hold cons cells.
That is, arrays of cons cells, if you want. Likewise, your arrays
would have type information in a similarly more efficient way.
Here's how I count. First, the data array. It has one word per
element, but it is aligned at and always allocate in some useful
chunk, like 4 words. This is to make allocation significantly
cheaper. Since the data array may need to be moved when grown, there
is a meta-object that contains the allocation size, the displacement
offset, the active size, and a pointer to the data array, or 4 words,
one allocation unit. The pointer to an allocation unit would of
course have four spare bits that could be used for type information,
just as pointers to cons cells have three spare bits for type
information, avoiding the header word overhead completely for the most
important builtin types. I need 4 cons cells to match an array of 4
elements.
| If you use a memory allocation scheme not requiring any header words,
| then the extensible 3-element array takes 6 words, the non-extensible
| array takes 4 words, and the list version takes 6 words. The array
| is equal to the list in space.
Yes, but only at enormous expense in allocation costs with such a fine
granularity. Also, you really do not want your array to move around
if it grows, or you lose the idea of identity of lists. Holding the
size without a pointer to data that can move means you always have to
copy the array if you do anything at all to it. This means that you
cannot perform any operations on the list implemented this way, only
on the elements of the fixed-size list. One of the charming issues
about lists is that they can grow so easily. If we are to give them a
_more_ efficient implementation, you can't take that away from them.
| Unless you use cdr-coding, you will need 2 pointers plus any
| per-object overhead (which may be zero) for each list element. This
| is twice as much space as using an array. This only wins for short
| lists, or for extensible size arrays whose size are just past the
| doubling threshhold.
The overhead is twice as much _after_ the overhead of arrays have been
dealt with, including that copying-when-doubling scheme, which out of
necessity must allocate memory for the same element many times over as
the list grows. By cleverly allocating allocation units for the data
array from a single pool that is not used for anything else, the same
way cons cells are allocated, you can grow your list painlessly and
linearly without copying until you allocate another array. You would
typically inform the allocator that you are allocating for a growing
list if you did this, howeer, to arrange for a large section of that
space to be temporarily reserved for the growing array. It would
revert to normal allocation once you inform the allocator that you
hare through growing that particular list. This cuts down on copying,
on reallocating, and speeds up just about everything while growing the
lists linearly, at the cost of a little bit more explicit code and
knowledge sharing between allocator and user. In a language with
support for these things you would want that, even if it would mostly
be used by system functions that build lists as return values.
If you get the impression that I have implemented this scheme for
real, that's exactly right.
| Because it is convenient, not necessarily because it is efficient.
When processing data that produces values of unknown lengths, and I
find myself doing that with an alarmingly high frequency compared to
values of known lengths, the growing pain of the array approach has
costs that do not win in the long run. If you usually process data
that produces values of known lengths with a similarly much higher
frequency than unknown lengths, you still have much higher overhad
than if you avoided the cleverness in the array implementation that
would take care of the growing. Hence, if you actually implement this
and compare approaches and test efficiency at a product-quality level,
lists made up of cons cells actually do win the efficiency contest,
especialy with a really fast cons cell allocator, and this not the
least because of the internal proximity of the data, which beats that
of the array that has been moved around several times while the data
it pointed to stayed where it was first stored (unless you implement
different memory pools for array allocation units and data, in which
case you don't get that added cost).
The interesting thing is, however, what the languages look like after
you decide to represent lists as arrays that way. E.g., cdr-ing down
a list can easily be emulated with displaced arrays, i.e., arrays that
really use another array's data for its elements, but starting from an
offset for elemen t0. (Hence the importance of storing type bits in
the pointer to the allocation unit for meta-object and data array, but
keeping them of the same size and from the same allocation pool.) You
may find that cdr-ing down an array allocates meta-objects, while
cdring down a list does not, because the _rest_ of an array is no
longer just a pointer to a cons cell, but a new displaced array. If
you want to push and pop elements on lists, you find that you want
lists that grow an shrink on different ends, leading to different
coding of many algorithsm, or you can be really clever and build
the list using displaced arrays that fill the allocation unit from
right to left and move them around instead. Many cool things that
affect what is efficient and convenient in surprising ways once you
start using it.
I did most of this work for a large automotive distributor back in the
late 80's when I had one of my many "pity I can't find a Lisp I can
use" periods and had major use for things Lisp makes easy. The system
was in use until they went from a dial-in terminal-based system over
both X.25 and regular phone lines to a web-based thing. I was greatly
amused when the new system cost 50 times more money to build (adjusted
for inflation and computer geek costs), required 30 times more memory,
processing power, and bandwidth to handle the same load, and almost 20
times more code to get the same work done. My system paid for itself
in 6 months of operation. I doubt this new system ever will pay for
itself before it is replaced or run into maintenance costs that will
keep it in the red for the rest of its life, but it's hot and flashy
and the users are very happy with the nicely laid-out web pages. The
users are also less efficient using it, though. Modern technology!
But I digress nostalgically.
Anybody remember the DEC PDP-10 (and DECsystem-20)?!? A certain number
of instructions were left unspecified, or rather, were specified as
"traps" in the architecture. Half of them were used for kernel calls,
but the other half of these "Unimplemented User Operations", or UUOs
as they were know, trapped to user-mode instead. ISTR you could get
the ill-mem-ref & arithmetic overflows to trap to user too, through
a kernel trampoline, but I could be mistaken about that part.
Or consider the AMD Am29000 family architecture, which used *extremely*
light-weight trampolines on some traps to handle them in user mode --
specifically, the ASSERTs in the register-window code that triggered
"spills & fills". If the address of the user-mode trap handler was
kept in a kernel register (which is was, for the spill-fill handler),
the trampoline only cost you a couple of cycles. E.g., the entire
kernel code for the "spill" trap trampoline:
uspilltrap:
mfsr tpc, PC1 ; save return address
mtsr PC1, uspill ; "branch"
add tav, uspill, 4 ; (sequential fetch)
mtsr PC0, tav ; "branch" completely
iret
Now the instructions which caused these traps were a set of "assert"
instructions that said one register was {EQ,NE,GT,LT,GE,LE} than another,
and the third arg was the trap vector number. These could very easily
be adapted to use for Lisp...
-Rob
-----
Rob Warnock, 31-2-510 rp...@sgi.com
Network Engineering http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043
> Nobody in their right mind has a desire to keep a "local copy" of
> the whole World Wide Web just to read an infinitesimally small
> fragment of it, yet that is what people do with USENET because
> getting out of how it has been done up to now is hard to get away
> from
You're probably right, but a good argument against it is that USENET
actually works in a responsive and reliable manner. WWW seems always
to be too slow and produce too many 40x's.
It can probably be solved with an intelligent limited distribution
system ensuring redundancy of paths and storage, but at the moment, I
think I'd rather just cut all the porn and warez from my feed.
-kzm
--
If I haven't seen further, it is by standing in the footprints of giants
> I expect that only some headers will be passed around very shortly,
> with really clever cache propagation protocols that makes clients
> retrieve an article by message-id on demand, ultimately from the
> originating server, cutting traffic and disk requirements down to what
> is actually used, and killing half the stupid spam problem, too.
I've been hoping for such a thing for about 5 years now, but is there
any work goin gon...?
--
(espen)
> |> I still believe that the wisest thing that the lisp community can ask
> |> of hardware and operating system vendors is fast user-level trap
> |> capability; with fewer registers saved and fewer cycles between the
> |> trap and the return from the trap. Such a feature would be much more
> |> likely to be widely used than cdr-coding, even in lisp systems.
> The big killer in any trap based system is the enourmous
> cost of making the trap traverse the OS.
So would you consider adding a PAL call to your Alpha for the purpose?
Jan
> 2. When we go to the 64-bit architecture, two changes occur:
> a. The best representation for a fixnum is a shifted-by-three-bits
> number, with three ls-bits 0.
Why only 3 bits? I would have tought to use at least 8 bits for tag
information, that or keep with the current 2 bits (if that is
sufficient). Why 3?
> I still believe that the wisest thing that the lisp community can ask
> of hardware and operating system vendors is fast user-level trap
> capability; with fewer registers saved and fewer cycles between the
> trap and the return from the trap. Such a feature would be much more
> likely to be widely used than cdr-coding, even in lisp systems.
Correct. That and considering that not all programs "die" after a
SIGILL or SIGFPU. Sometimes signals are considered "slow" and not
worthy of either fast or correct code. (see the lazy saving of the
fpu flag and the bad interaction of that with signals on Linux for
instance).
Groetjes, Peter
--
LANT nv/sa, Research Park Haasrode, Interleuvenlaan 15H, B-3001 Leuven
mailto:Peter.V...@lant.be Phone: ++32 16 405140
http://www.lant.be/ Fax: ++32 16 404961
Duane> However, take a look at what has happened:
Duane> 1. A more recent Sparc architecture release (either V9 or V10, I
Duane> forget which) has deprecated taddcc/tsubcc. This is presumably
Duane> because there was not enough use for it (though the lisp community,
Duane> at least, use it).
V9 is the latest. Note that according to my manual, only
taddcctv/tsubcctv are deprecated. They left taddcc/tsubcc in with a
note to follow these instructions with an appropriate branch or trap.
However, I have no doubt that taddcc/tsubcc will go away in some
future version.
Note that lots of other instructions were also deprecated in V9:
o all branches are replaced with branch-with-prediction
o The 32-bit multiply and divide instructions in favor of the 64-bit
operations.
Duane> 2. When we go to the 64-bit architecture, two changes occur:
Duane> a. The best representation for a fixnum is a shifted-by-three-bits
Duane> number, with three ls-bits 0.
What's wrong with having using 4 of the 16 tag bits for fixnums? Then
fixnums still have 00 as the 2 least significant bits. (Or does ACL
use all 16 tag values already?)
Duane> b. Overflow occurs into the sign bit of a 64-bit machine integer,
Duane> rather than into the sign bit of a 32-bit machine integer.
Duane> The taddcc/tsubcc instructions look at two ls-bits, and check
Duane> overflow into the 32nd bit. There are no equivalent instructions
Duane> which could be considered "64-bit txxxcc", which would check the
Duane> 3 ls-bits and a 64-bit signed overflow.
Not quite. txxxcc will set the 64-bit overflow condition on 64-bit
overflow. However, it doesn't set the 64-bit overflow condition if
the 2 ls-bits aren't zero. This is only indicated in the 32-bit
overflow condition. I don't understand why they didn't make the
64-bit overflow work the same way.
Another annoyance is that the add with carry instruction only adds in
the carry produced by a previous 32-bit operation. There's no
instruction to add the carry produced by a 64-bit operation.
Ray
CDR-coding is an implementation technique for hiding an array structure
behind a list structure.
There are other techniques you can use, for example you can simply partition
memory into "dotted pair territory" and "array territory", and start all
arrays on a known page boundary so that one comparison can tell if you're
looking at an array or a list and a masking operation will get you the
start of the array. This would limit you to a maximum array size (the
granularity of the pages) so you'd still want lists of arrays, but that
still gives you a factor of (page size / array element size) speedup in
indexing, and the memory overhead would be (1 - pointer size / page size).
Or you can use a mixture, and steal one bit to distinguish between conses
and arrays.
But it's all an issue of implementation techniques. You can pretty much hide
the whole thing from the programmer, with a little cleverness.
> Touche'. However, I would call it Call-by-deReference :-)
>
>
But why coin a new term when there is a perfectly
good, old one that exactly describes what is going
on? VALC was expressly designed to implement
call-by-name.
--
There is probable cause to suspect foul play,
but it is too late for us to find out.
The Supremes
As I recall (although I had nothing to do with Macs back then), early
Macs also used a 24-bit address space, and the same problems arose
when 32-bit addresses were introduced.
> 64-bit architectures tend to define their
> entire virtual space. For example, the PA-RISC with the W bit set defines
> text segments starting at 0x4000000000000000, and data segments starting at
> 0x8000000000000000. Which bits would you steal? Some bits in the middle?
Sure. The allocator needs to know about it, but as long as there's
plenty of virtual space, it's not that hard to do. If there isn't --
and I do believe it would be rash to assume somebody won't find a use
for all that space -- you need another version of your system with no
cdr-coding. It's wasteful not to put all that virtual space to
creative use, if the application doesn't need it all.
> > >Note that one such test that becomes nontrivial on General Purpose
> > >hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
> > >those extra bits must be masked off before the comparison. I don't
> > >know if anyone has ever considered placing those extra bits "elsewhere",
> > >e.g. in a parallel location to the actual pointer/tag word.
I can't figure out what kind of an implementation you envision that
would have a special problem with EQ. I would place the extra bits in
the CAR cell to code the representation of the CDR, and mask them out
when reading the CAR; if you don't mask them out then, there's lots of
other operations that would have to know about them as well, not just
EQ.
--
Pekka P. Pirinen, Adaptive Memory Management Group, Harlequin Limited
The Memory Management Reference: articles, bibliography, glossary, news
<URL:http://www.xanalys.com/software_tools/mm/>
> Both are possible, as well as using lisp as glue to hold
> together systems with subsystems written in different languages.
> However, the applications I refer to, which _must_ use lisp for
> its ability to handle large problem sets, tend to be largely
> written in lisp.
I guess I'm trying to figure out why LISP is unique
in its ability to handle these large problem sets.
What is it about these data sets that prevents use
of C or C++, and *requires* LISP?
I would have thought LISP would be more in demand
because of the dynamic nature of the language,
quick prototyping capability and so on ...
Manoj
PS: Note that I'm not a big fan of C or C++, I believe
that the world should switch to SML and Python as
its primary languages :)
> >>>>> "Duane" == Duane Rettig <du...@franz.com> writes:
>
> Duane> 2. When we go to the 64-bit architecture, two changes occur:
> Duane> a. The best representation for a fixnum is a shifted-by-three-bits
> Duane> number, with three ls-bits 0.
>
> What's wrong with having using 4 of the 16 tag bits for fixnums? Then
> fixnums still have 00 as the 2 least significant bits. (Or does ACL
> use all 16 tag values already?)
I assume you mean 4 of the 16 tag _values_. Peter Vaneynde asked a more
inclusive question in another article, so I'll answer this at the same
time I answer his question. It turns out that the number of bits used
for tag makes no difference to the following:
> Duane> b. Overflow occurs into the sign bit of a 64-bit machine integer,
> Duane> rather than into the sign bit of a 32-bit machine integer.
> Duane> The taddcc/tsubcc instructions look at two ls-bits, and check
> Duane> overflow into the 32nd bit. There are no equivalent instructions
> Duane> which could be considered "64-bit txxxcc", which would check the
> Duane> 3 ls-bits and a 64-bit signed overflow.
>
> Not quite. txxxcc will set the 64-bit overflow condition on 64-bit
> overflow. However, it doesn't set the 64-bit overflow condition if
> the 2 ls-bits aren't zero. This is only indicated in the 32-bit
> overflow condition. I don't understand why they didn't make the
> 64-bit overflow work the same way.
Thanks for the reminder; I looked at this over a year ago as part of
feasibility studies to port our 64-bit lisp on the Alpha to other
major architectures. My statement is still correct in its conclusion,
if not in the details; both operations of the instruction must have
64-bit effect if the instruction is to be useful to us. When I saw
the description in the architecture manual, I apparently made a mental
note: "not a 64-bit instruction", even though apparently it does have
_some_ 64-bit effects...
> In article <4aea2i...@beta.franz.com>,
> Duane Rettig <du...@franz.com> wrote:
> >chu...@best.com (Chuck Fry) writes:
> >
> >> And with 64 bits to play with, who's to say we can't spare a couple for
> >> CDR-coding?
> >
> >Let's not make this mistake again. When IBM (and Amdahl) created their XA
> >architectures for the 370 compatibles (extending the address range from 24
> >bits to 32 bits), several programs suddenly broke.
>
> Can anyone come up with a plausible program in the next two decades, to
> run on conventional architectures, that will require an address space
> between 60 and 64 bits? If you say "let's map everything into a flat
> space" you will need more than 64 bits. If you stick to conventional
> addressing, including memory mapping individual objects, you use a lot
> less than 64 bits.
Take a plastic ball and cover the bottom half with microwave detecters
and A/Ds. Hand wave that some horrid reasearcher will come up with a way
to get 8 bits per sample, even at, say, 500GHz. Hand wave that the packages
is about 5cm. So that is about 200 per ball, with out getting too thing
about cramming them all in. So that comes to 1TB per sec.
That per 'ball'. You have a square Km of them, plus extras. Say a spacing
of 2M, thats 500x500, 250,000. So that comes to 250 EB/sec... Now run it
24 hrs/days for a decade... Or, just take 10 hrs data for analysis. 64
bits does not go far here, just in data size.
Plus, the accounting software may need to be 64+ bits as well :)
--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
Raw, Cooked or Well-done, it's all half baked.
> In article <m24s09h...@kelso.bothner.com>,
> Per Bothner <p...@bothner.com> wrote:
> > Note the context of this discussion: Is cdr-coding a good idea?
>
> CDR-coding is an implementation technique for hiding an array structure
> behind a list structure.
CDR-coding is an implementation technique for optimizing
the space usage and the locality of lists.
--
Rainer Joswig, Hamburg, Germany
Email: mailto:jos...@corporate-world.lisp.de
Web: http://corporate-world.lisp.de/
> In comp.arch Duane Rettig <du...@franz.com> wrote:
> > pla...@nospam-cs.wisc.edu writes:
> >> In comp.arch Duane Rettig <du...@franz.com> wrote:
> >> > languages, comparatively). However, the very fact that lisp is able
> >> > to handle large problem spaces well has always allowed it to be used
> >> > for increasingly larger projects. Currently, we have customers,
> >> > especially in the CAD realm, whose data spaces push beyond the normal
> >> > 2 to 3 Gb address spaces allowed by 32 bit systems, thus causing a
> >> > need to move toward usage of 64-bit addressing. Certainly it was
> >>
> >> Out of curiosity: what kinds of large commercial
> >> applications use LISP? Do they use it as an embedded
> >> scripting engine, or to write the application proper?
>
> > Both are possible, as well as using lisp as glue to hold
> > together systems with subsystems written in different languages.
> > However, the applications I refer to, which _must_ use lisp for
> > its ability to handle large problem sets, tend to be largely
> > written in lisp.
>
> I guess I'm trying to figure out why LISP is unique
> in its ability to handle these large problem sets.
> What is it about these data sets that prevents use
> of C or C++, and *requires* LISP?
nothing *requires* Lisp, per se, as you can always write a common-lisp
in, for example, C and C++.
> I would have thought LISP would be more in demand
> because of the dynamic nature of the language,
> quick prototyping capability and so on ...
Lisp offers many things that help with a large project. i think the
most useful is that Lisp helps you retain information.
*) you can dump your entire state in a core and easily pick up right
where you left off.
*) you can dump Lisp structures for later recall simply by
readably-printing them. you can even include the functions for
processing the date in the data structure which you store to disk (an
object orientation move that C++ can only dream of).
*) no more lost pointers.
*) no tedious and error prone manual memory management. Lisp usually
has generational garbage collection which can avoid memory
fragmentation issues that C++ usually suffers from.
*) all values have types associated with them, therefore, the meanings
associated with values tend to be retained better.
*) first class functions allow a high level programming style.
*) macros let you warp the language toward your problem set. when the
language makes thing you want to do more natural, you do them without
having to concentrate on (and possibly get confused by) gratuitous
detail.
information lossage is probably my biggest problem in using computers
today. i make a file and then forget what it was supposed to do.
> Manoj
>
> PS: Note that I'm not a big fan of C or C++, I believe
> that the world should switch to SML and Python as
> its primary languages :)
>
--
J o h a n K u l l s t a m
[kull...@ne.mediaone.net]
sysengr
> Duane Rettig <du...@franz.com> writes:
>
> > 2. When we go to the 64-bit architecture, two changes occur:
> > a. The best representation for a fixnum is a shifted-by-three-bits
> > number, with three ls-bits 0.
>
> Why only 3 bits? I would have tought to use at least 8 bits for tag
> information, that or keep with the current 2 bits (if that is
> sufficient). Why 3?
OK, deep question. I'll try to answer it thoroughly. [comp.arch,
sorry if this long post seems inappropriate to continue as a
cross-post; My reasoning is that although it has less to do with
computer architecture, it does have to do with Lisp architecture on
general-purpose computers, and may help make sense of some of the
other issues on this thread]
Also, before I start, I must say that I make no claims that this is the
_right_ way to set up the lisp tagging system, and other lisp vendors
may make other design decisions. However, I have a feeling that they
will agree that this is an optimal arrangement. Also, as a historical
note, our Allegro CL was a late-comer to the two-bit-tagged fixnums design;
we moved from a three-bit fixnum tag 7 or 8 years ago and joined both
Lucid(Liquid) and Harlequin(Xanalys) Lispworks in their two-tag-bit
fixnum design. The performance advantage was very specific, and is
described below. I am not sure what other CL vendors have done.
=====
I now want to answer your first question "why not 8 bits for tag?".
Consider that in a byte addressed machine, every bit you "steal"
from the address halves the number of bytes directly addressible
by the rest of the bits. So the size and placement of the tag is
important to consider.
There are two places you could put such a tag: either in the most
significant bits or the least significant bits:
- If you place the tag in the most significant bits, then you can
retain byte-addressibility, but you lose half of the address space
for each bit stolen.
Now, even if you are of the opinion that we will never need more
than 56 bits (i.e. 64 - 8) you must still deal with the fact that
programming conventions place data spaces into different ranges
of memory; e.g. Alpha allows memory to be mapped at any location,
but HP sets up address 0x4000000000000000 and higher for text,
and 0x8000000000000000 and higher for data. If you place tags in
the top bits, how are you going to distinguish between these two
spaces?
- If you place the tag in the least significant bits, you lose
granularity in the addressing for each bit used in the tag. This
is not a problem as long as there are enough bits left in the rest
of the word to address the smallest lisp object that will be
allocated, assuming that it is always aligned the same way.
If therefore you take up 8 bits of tag in the least significant
bits, the smallest lisp object you can allocate is 256 bytes.
Seems like quite a large cons cell to me!
In fact, the ideal number of bits to allocate is determined by the
minimum size of a cons cell, which is 8 bytes (i.e. 3 bits of tag)
on a 32 bit lisp, and 16 bytes (i.e. 4 bits of tag) on a 64-bit
lisp.
=====
Finally, it will make the current design easier to describe if I set
down some terms. These are my definitions, and other vendors may use
different terms:
Natural Word: This is a lisp-centric term; in a 32-bit lisp it is a
32-bit value, and in a 64-bit lisp it is a 64-bit value. Various
architectures have different ways to generate 32 or 64 bit
applications. Many set a mode bit which transforms registers. Alpha,
on the other hand, has no 32 bit mode, but provides a software-only
solution called taso (truncated address-space-option). The natural
word is the word size set up by using one of these options.
LispVal: A single natural word that describes any lisp object. It
contains a tag (in this design it is some number of ls-bits) and a
data portion. Depending on the value of the tag, the data portion
is either interpreted as a pointer to storage in the lisp heap, or
is itself the immediate data.
AU (Allocation Unit): A unit of measurement that is two LispVals in
size. All objects in the lisp heap are allocated on an AU aligned
boundary, and are each an even multiple of AUs in size. This is
also the minimum size that a non-cdr-coded cons cell can be, since
it must at least contain a CAR and CDR LispVal within it.
LispObject: A lisp object which is allocated in the lisp heap. It
is always aligned on even AU bounaries.
lisp type: Common Lisp defines many types of objects, many more than
can be distinguished by less than 6 or 7 bits. Thus, for this
kind of tagging situation, one tag value is always given the
meaning "other", which means that one must consult the first byte
of the LispObject itself for the actual type. Tags that are not
the "other" tag describe the lisp type directly.
The number of tag bits in a LispVal is optimally set to 3 bits for
32-bit ports, which allows for 8-byte granularity, sufficient to
address a cons cell of two 4-byte LispVals, and to 4 bits for 64-bit
ports, which allows for 16-byte granularity, sufficient to address
a cons cell of two 8-byte LispVals.
Now, we narrow the discussion to fixnums:
A fixnum is an immediate LispVal, because the portion of it that
is not tag bits is immediate data which represents the value of
the small integer it describes. A fixnum always includes the 0
tag, because it allows fixnums to be manufactured from machine
integers by simply shifting left. The number of bits that provide
the actual value determine the range of fixnums, beyond which
other lisp integers called bignums are allocated to provide a
smooth continuum over a large integer range.
Because fixnums are important for small integer calculations, it
may be desirable to have more than the 32-3 bits or 64-4 bits
for the fixnum value. To do this, we "steal back" some of the
tag bits by actually allocating multiple tag values for fixnums.
To steal back one bit in a 32-bit lisp, we use tags 0 and 4, and
thus the ms-bit of the tag becomes the ls-bit of the fixnum, i.e.
tag 0 is for even fixnums and tag 4 is for odd fixnums. To steal
one bit from a 64-bit lisp, we would use tag 0 for even, and tag
8 for odd fixnums, or two steal 2 bits we would use tags 0, 4, 8,
and 12.
Now the actual design decision becomes "how many tag bits should we
steal for fixnums?"
One criterion for determining this is to realize that every tag bit
which is stolen essentially halves the number of tags which can be
used for direct type determination. I don't remember where I heard
this, or about what language, but as I recall, one typed language
thought so highly of fixnums that they allocated all but one bit of
the tag for fixnums, thus leaving only tags 1, 3, 5, and 7 for three
direct-type tags and an other tag in a 32-bit port.
Raymond Toy asked asked a question along these lines when he
suggested that we steal 2 bits out of 4 for the 64-bit port, and
asked if we had already used up the tag space. The answer, Ray, is
"No", we have not yet used any more tags; there are many tags open
still in the 64-bit port, so stealing two bits would be possible.
Finally, we come to the design criterion which set our choice; it
is array access efficiency. The majority of simple arrays in any
lisp are either strings or else they contain elements of the same
size as a Natural Word. Now, all array accesses (e.g., AREF, SVREF,
and their inverses) contain an offset component, to compensate for the
tag and array header, and a shift component to compensate for the
difference between the width of the array element and the "width"
of the fixnum index, as established by the number of bits in its tag.
If the width of the array element matches the "width" of the fixnum,
then no shift is necessary, and the access is faster. In fact, tight
loops of array access operations are within 20% of C speed, and in
special cases where atomic array loops (i.e. with no chance of a gc
moving the array) where the tag offset is precalculated, the access
can be 20% faster than C. These loops tend to only differ by only
one instruction.
To put some understandability to the above paragraph, assume the
choice we made of two fixnum tag bits for 32-bit ports, and three
tag bits for 64-bit ports: A (simple-array t (*)), which is a vector
of LispVals, differs by the size of the LispVal from element to
element. Thus, the first element of an array is at offset+0,
and the second element of the array is at offset+4 for 32-bit ports
and offset+8 for 64-bit ports. Then, if we consider what the bit
pattern is for finxums as we count, we note that on 32-bit ports,
fixnums are counted as 0 (for fixnum 0), 4 (for fixnum 1), 8, 12, 16,
etc., or a difference of 4 for each fixnum increment. And in a 64-bit
port, the counting is 0 (fixnum 0), 8 (fixnum 1), 16, 24, 32 ...
or a difference of 8 per fixnum increment. Thus, fixnum increments
track exactly the address increments of array elements, and thus
fixnums are able to directly access array elements after the offset
is taken care of.
> Duane Rettig <du...@franz.com> writes:
> > 64-bit architectures tend to define their
> > entire virtual space. For example, the PA-RISC with the W bit set defines
> > text segments starting at 0x4000000000000000, and data segments starting at
> > 0x8000000000000000. Which bits would you steal? Some bits in the middle?
>
> Sure. The allocator needs to know about it, but as long as there's
> plenty of virtual space, it's not that hard to do. If there isn't --
> and I do believe it would be rash to assume somebody won't find a use
> for all that space -- you need another version of your system with no
> cdr-coding. It's wasteful not to put all that virtual space to
> creative use, if the application doesn't need it all.
Right, but at Franz, we are not making such assumptions of non-use; this
is precisely the kind of assumption leading to a rewrite that I think
we should try to avoid.
> > > >Note that one such test that becomes nontrivial on General Purpose
> > > >hardware (i.e. not lisp machines) is EQ. Instead of one instruction,
> > > >those extra bits must be masked off before the comparison. I don't
> > > >know if anyone has ever considered placing those extra bits "elsewhere",
> > > >e.g. in a parallel location to the actual pointer/tag word.
>
> I can't figure out what kind of an implementation you envision that
> would have a special problem with EQ. I would place the extra bits in
> the CAR cell to code the representation of the CDR, and mask them out
> when reading the CAR; if you don't mask them out then, there's lots of
> other operations that would have to know about them as well, not just
> EQ.
Yes, exactly! Without cdr-coding, EQ objects have identical tagged
pointer values (which I call LispVals) and thus no masking is necessary
at all to compare identity. And my example was by no means
exhaustive, but was instead only the simplest example.
>On Tue, 12 Dec 2000 14:45:18 -0500, p...@panix.com (Paul Wallich) said:
>
>> Meanwhile, there's another limit that might be harder to deal with:
>> 250 GB/day is about 23 megabits/sec average, i.e. half a t-3 for the
>> news backbone. The expanded version of usenet will require about
>> 80 exabits/sec for the backbone;
>
>Eh, no, 811 exabytes/day is not 80 exabits/sec, but roughtly 80
>petabits/sec, which is much easier to handle. ;)
Oops, got my prefixes swapped.
>
>> to give you a rough idea of what
>> that means, using straightforward modulation techniques that would
>> be about 10 signal transitions per cycle of yellow light.
>
>Does this mean that it's only (roughly) 0.01 signal transitions per
>cycle of yellow light, or is that what you get for petabits?
The latter. Exponent was right, name was wrong.
>But still, the time for this problem to arise might still come.
>Usenet news isn't the only thing in the world requiring larger amounts
>of storage and bandwidth. Video-on-demand has (so far) largely been a
>failure for the mass market because of the lack of bandwidth for the
>necessary quality (although MPEG-4 and later MPEG-7 may make subtle
>changes to this), and sometime, people may actually have those nice
>HDTV (or better) displays in their homes. So yes, I think we'll reach
>that limit one day. Maybe it won't be in 20 years, but I'll be around
>when it happens.
I don't think so, mostly because I don't think that the notion of "backbone"
will make any sense at that point. There still will be
places where enormous amount of data get shipped around, but the idea of
big, fat, long pipes connecting a bunch of narrow ends seems less likely.
Btw, video-on-demand requires less bandwidth into each house than does
regular cable TV. It's the aggregate bandwidth but more important the
decentralization that gets you.
paul
> Really? Reading Lisp data from input streams is well-defined today,
Perhaps, but does it make sense? Presumably you application has some
idea of the structure of the data it is expecting? (Even trees have
structure.)
> but arrays are very inefficient because you don't know their size
> a priori, and copying elements like crazy as the list grows is
> much less efficient than grabbing cons cells from a pool of them.
But that is just my argument - it is *not* much less efficient. If
you double the array as it grows, the "average" element is only copied
*a single time*. (Some elements may be copied n*log(n) times, but at
least half the elements have been copied at most once, at least
three-quarter of the elements have been copied at most twice, etc.)
> The way Lisp readers usually deal with arrays is in fact to build a
> list from the input and then convert the list to an array.
A reasonable thing to do, at least if you have a good GC.
> You recommend arrays to other people, not just to yourself, unless I'm
> _gravely_ mistaken about the purpose of your article. Other people
> who have much less experience than you and rely on your experience in
> order to form their own, with the least possible pain. My experience
> with people who try to implement things with arrays is that they end
> up with very cons-like concepts once they outgrow the C "memory is
> really an array of bytes" idea. The reason is simple: Arrays with the
> properties we discuss are very, very hard to implement correctly.
It is not that hard. Still, I agree it is is best if arrays are well
supported by the programming language or at least a good library.
> If we _are_ talking about what a compiler designer would do for a
> language that natively supports these things in such a way that
> programmers would never _see_ the array, the kind of reasoning you
> have provided is completely misplaced. I have therefore assumed that
> you do _not_ argue about primarily what a compiler writer would do,
> although CDR-coding _is_ at that level.
My argument is a little of all of:
(1) I think high-level programming languages should be built around
array support rather list support. (Actually, I prefer language
primitives to work on generalized sequences, with arrays just being
the default implementation.)
(2) I think programmers should (in general) use arrays rather than
lists, even in Lisp-like lanuages, at least when the number of
elements is large and performance is important.
(3) As a consequence of (1) and (2), I don't see a lot of justification
for cdr-coding, at least as something one would do in hardware.
> I think you need to explain how you ended up with 9 words for the
> list. There is of course no header word for cons cells if you are
> going to implement a very basic idea in a language with this.
I discussed this case in the following paragraph.
> Type information is stored much more efficiently if you care about
> these things, or you allocate them in blocks that only hold cons
> cells. That is, arrays of cons cells, if you want. Likewise,
> your arrays would have type information in a similarly more
> efficient way.
Agreed, at least for a Lisp-like language. For a object-oriented
language like Java, with many different types where fewer of them
predominate, it may make more sense to always have a header word. (Of
course CL is object-oriented. However, it differs from Java in that
certain pre-defined types are used very heavily so it makes sense to
optimize for those types in a way that makes less sense for Java.)
> Here's how I count. First, the data array. It has one word per
> element, but it is aligned at and always allocate in some useful
> chunk, like 4 words. This is to make allocation significantly
> cheaper. Since the data array may need to be moved when grown, there
> is a meta-object that contains the allocation size, the displacement
> offset, the active size, and a pointer to the data array, or 4 words,
> one allocation unit.
The displacement offset is not needed for adjustable arrays. It is only
needed for displaced arrays. And you can place the allocation size
at the start of the data array. (It would be needed there anyway if
memory management code needs to scan memory consequtively.) That gives
only 2 words for the header meta-object.
> | If you use a memory allocation scheme not requiring any header words,
> | then the extensible 3-element array takes 6 words, the non-extensible
> | array takes 4 words, and the list version takes 6 words. The array
> | is equal to the list in space.
>
> Yes, but only at enormous expense in allocation costs with such a fine
> granularity. Also, you really do not want your array to move around
> if it grows, or you lose the idea of identity of lists.
But the array object doesn't move if you use a header meta-word, only
the data array moves.
Of course if you want to preserve identity of lists for linked links,
you will also need an initial header cons.
> The overhead is twice as much _after_ the overhead of arrays have been
> dealt with, including that copying-when-doubling scheme, which out of
> necessity must allocate memory for the same element many times over as
> the list grows.
As I said: Only twice total, on average.
> By cleverly allocating allocation units for the data
> array from a single pool that is not used for anything else, the same
> way cons cells are allocated, you can grow your list painlessly and
> linearly without copying until you allocate another array.
No such tricks need to painlessly and *linearly* grow the array,
even *with* copying. Not to say that your scheme is a bad one
- I agree it a very reasonable thing to do.
> The interesting thing is, however, what the languages look like after
> you decide to represent lists as arrays that way. E.g., cdr-ing down
> a list can easily be emulated with displaced arrays, i.e., arrays that
> really use another array's data for its elements, but starting from an
> offset for elemen t0.
Agreed. However, my preference is for languages where the programmer
does not usually explicitly "cdr" down a list - or even index them.
I think languages should encourage programmers to work with arrays
as a unit, and language primitive should encourage that. I'm in favor
of "array languages" or what we might loosely call APL-style languages.
In the context of Common Lisp, I prefer the "series" design rather than
the "loop" design.
> > CDR-coding is an implementation technique for hiding an array structure
> > behind a list structure.
> CDR-coding is an implementation technique for optimizing
> the space usage and the locality of lists.
I see these two statements as being equivalent in the context of this
discussion.
Symbolics was concerned with space usage. You don't get
constant time access of arrays. Accessing the nth
element of a cdr-coded list is not equivalent in terms of speed
to accessing the nth element of an array.
Sent via Deja.com
http://www.deja.com/
>> Can anyone come up with a plausible program in the next two
>> decades, to run on conventional architectures, that will require an
>> address space between 60 and 64 bits? If you say "let's map
>> everything into a flat space" you will need more than 64 bits. If
>> you stick to conventional addressing, including memory mapping
>> individual objects, you use a lot less than 64 bits.
Paul> Take a plastic ball and cover the bottom half with microwave
Paul> detecters and A/Ds. Hand wave that some horrid reasearcher will
Paul> come up with a way to get 8 bits per sample, even at, say,
Paul> 500GHz. Hand wave that the packages is about 5cm. So that is
Paul> about 200 per ball, with out getting too thing about cramming
Paul> them all in. So that comes to 1TB per sec.
Paul> That per 'ball'. You have a square Km of them, plus extras. Say
Paul> a spacing of 2M, thats 500x500, 250,000. So that comes to 250
Paul> EB/sec... Now run it 24 hrs/days for a decade... Or, just take
Paul> 10 hrs data for analysis. 64 bits does not go far here, just in
Paul> data size.
Paul> Plus, the accounting software may need to be 64+ bits as
Paul> well :)
The jump from 16 bit architectures to 32 was a _pretty_ big deal, as
the typical "dynamic scope" of 32 bits, +/-2^31-1, is a whole lot
bigger than +/-32767, and suffices to characterize a lot more
problems.
Jumping from there to 64 bits is a _whopping_ big jump; while 32 bits
isn't quite enough to represent national debts, it's not _far_ off,
and whether we speak of 36 bits, 48, 60, or 64, it's all "quite large
enough" to represent values that required compound data structures
with only 32 bits to play with.
Long and short being:
"No, the accounting software probably does _not_ need 64+ bits;
they'll probably revalue the currency before that becomes
relevant..."
There will indeed always be some "bleeding edge" applications that
could use the maximum capabilities that any technology can ever offer;
that doesn't mean that such technologies will be economically
feasible, or that they will offer enough to "more mundane"
applications to economically justify their development.
But where 8 bits sufficed for some applications, and having more is
gravy, the same is true for successive generations of microprocessors,
and 60 or 64 bits will "suffice" for many classes of applications
regardless of whether we are thinking of numeric ranges or the sizes
of memory addresses.
It should be eminently useful to take 64 bit architectures and waste a
few bits on tagging; cutting ranges from 64 to 60 still leaves a
useful range for _any_ application that has been deployable on a 32
bit architecture.
--
(concatenate 'string "cbbrowne" "@hex.net")
<http://www.ntlug.org/~cbbrowne/>
Referring to undocumented private communications allows one to claim
virtually anything: "we discussed this idea in our working group last
year, and concluded that it was totally brain-damaged".
-- from the Symbolics Guidelines for Sending Mail
> In article <joswig-54D876....@news.is-europe.net>,
> Rainer Joswig <jos...@corporate-world.lisp.de> wrote:
> > In article <917vho$d...@web.nmti.com>, pe...@abbnm.com (Peter da Silva)
> > wrote:
> > > In article <m24s09h...@kelso.bothner.com>,
> > > Per Bothner <p...@bothner.com> wrote:
> > > > Note the context of this discussion: Is cdr-coding a good idea?
>
> > > CDR-coding is an implementation technique for hiding an array structure
> > > behind a list structure.
>
> > CDR-coding is an implementation technique for optimizing
> > the space usage and the locality of lists.
>
> I see these two statements as being equivalent in the context of this
> discussion.
The post points out that the discussion may be fundamentally flawed:
lists and vectors are abstract data types, linked lists and arrays are
implementations. Up until that post, the data type and implementation
were being confused.
--
Joseph J. Pfeiffer, Jr., Ph.D. Phone -- (505) 646-1605
Department of Computer Science FAX -- (505) 646-1002
New Mexico State University http://www.cs.nmsu.edu/~pfeiffer
VL 2000 Homepage: http://www.cs.orst.edu/~burnett/vl2000/
Excuse me? Yes, it makes sense to read Lisp data that the application
does not know the structure of a priori. For one thing, source _code_
matches that description. Data whose structure is only evident after
having been read in as a complete expression also matches very well.
| Presumably you application has some idea of the structure of the data
| it is expecting? (Even trees have structure.)
Yes, it knows they will all be Lisp objects. That's it. This is part
of the significant charm with Lisp's lists.
| But that is just my argument - it is *not* much less efficient. If
| you double the array as it grows, the "average" element is only copied
| *a single time*. (Some elements may be copied n*log(n) times, but at
| least half the elements have been copied at most once, at least
| three-quarter of the elements have been copied at most twice, etc.)
There's something wrong with your math. The average number of times
an element is copied is 1.0 if you have (- (1+ (expt 2 n)) m) elements
in the final array with m elements in the initial allocation and
approaches 2.0 asymptotically for (1+ (expt 2 n)) elements in the
final array, since adding the last element will copy everything and
end up with half the allocated array empty (short that one element).
| My argument is a little of all of:
|
| (1) I think high-level programming languages should ...
| (2) I think programmers should ...
| (3) As a consequence of (1) and (2), ...
I'd be interested in how you _arrived_ at your preferences, instead of
just having something follow from some random personal favorites.
| (Of course CL is object-oriented. However, it differs from Java in
| that certain pre-defined types are used very heavily so it makes sense
| to optimize for those types in a way that makes less sense for Java.)
This is where you're losing me. I don't see a huge need for making a
basic type more efficient in languages that _don't_ show a predominant
use of that basic type. And why are we discussing Java? I thought it
was already as array-friendly as you wanted languages to be? And Lisp
already has adjustable arrays, so what's the point in redesigning them?
| The displacement offset is not needed for adjustable arrays. It is only
| needed for displaced arrays.
Sigh. Why am I wasting any time on this? The displacement is needed
for the "rest" (cdr) function that people who work on lists, however
they are actually implemented, need very often. If you want to offer
Lisp people something in terms of efficiency, you can't take away
their primitives without changing the entire language under them.
That's what I was trying to get at in the last parapraph I wrote.
| And you can place the allocation size at the start of the data array.
Yes, but that is a truly boneheaded decision as it means that the data
array can no longer be specialized to contain a uniform data type --
say double-floats instead of pointers to double-floats (and you really
want that if you go the way of arrays in the first place -- people
even ask for specialized list types, and they won't stop if they know
the underlying implementation is array-based), but must somehow have a
special initial element or consist of a single word that destroys
alignment of the rest of the array. This is just stupid. It's just
like the old string representation that used the first byte to hold
the length and couldn't hold more than 255 characters per string.
| (It would be needed there anyway if memory management code needs to
| scan memory consequtively.)
Really? I thought memory management code followed pointers and chains
from a root set if it were any smart. Randomly walking over memory it
has no idea what contains just seems like a recipe for major disaster.
Some conservative garbage collectors that are bolted onto languages
that were never properly designed to deal with abandoned pointers seem
to make do with interpreting machine words as pointers on a hunch, but
I have always been exceedingly skeptical of the entire approach. If
you keep the size of a vector in a well-known structure with the
pointer to the data array, there is only one way to get at that data
array, anyway.
| That gives only 2 words for the header meta-object.
And much more overhead and increased complexity and general madness.
You have obviously never actually experimented with this in _Lisp_-
like languages. I don't care about optimizing other languages for
adjustable arrays, as we already have efficient adjustable arrays in
Common Lisp, so there's no point in reinventing the wheel, and using
adjustable arrays to represent lists requires concern for the usage of
lists so optimized. I'm sure you can optimize this stuff to death in
Java, but Java doesn't have any need for cdr-coding or more efficient
storage of lists in particular, either, so I must admit to completely
failing to see the whole point of this exercise, except that you get
to argue for your preference for array-based languages, which I do not
share, and am unlikely to begin to share just because of this.
| Of course if you want to preserve identity of lists for linked links,
| you will also need an initial header cons.
I'm sorry, but is this a real argument? You're talking about data
arrays that move when they grow at the tail end, right? Lists can
grow at the tail end without any "header cons" simply because we just
overwrite the cdr that is nil in the final cons with a pointer to the
cons cell that holds the new element (and a new nil). Lists thus
grown retain their first cons cell, which does not move, and thus
retains identity at the pointer level. You do need a meta-object for
the header of arrays because the data array moves the address of the
first element when it is moved.
| As I said: Only twice total, on average.
Well, you actually said *a single time*, but if you keep doubling it
every time you say it, I'm happy. (Sorry, but I found that fuuny. :)
| However, my preference is for languages where the programmer does not
| usually explicitly "cdr" down a list - or even index them.
Well, I think it is preferable to ask people who prefer the languages
they are trying to optimize for their ideas on how to optimize them
instead of optimizing the programmers to use some language _they_ do
not prefer simply because the optimizers do.
I'm confused about your purposes, I guess. Promote APL if you like,
but then I don't see why you're even offering suggestions for how to
implement lists efficiently in Lisp. This is mystifying to me.
#:Erik
--
The United States of America, soon a Bush league world power. Yeee-haw!
I think you are correct if (a) the language supports variable length arrays
or
(b) the user knows how many elements he will have or (c) random access of
the sequence data is important and (d) no large amountof sharing of sequence
data is needed and (e) no insertion of elements into the sequence happens.
In other
words, it depends. Also, note that what is supported by the language is
often what
gets coded by a user. If every sequence looks like an array, the user will
avoid
insertion and sharing of substructure and start assuming that O(1) access of
any
element is present. This, of course, makes some code very ugly, but since
the
user doesn't see any obvious alternative, he's stuck.
> (3) As a consequence of (1) and (2), I don't see a lot of justification
> for cdr-coding, at least as something one would do in hardware.
If you read my post carefully, you would have seen that nowhere did I ask
whether
or not there should be hardware support for such a feature. In fact,
history has shown
that language specific features for hardware are usually a bad idea. What I
asked was
given the mismatch of cache speed and CPU speed, would a software
implementation
of CDR-coding be a good implementation technique. I was hoping that someone
might
pop up with some discussion that actually involved hardware and cycle count.
Instead
I get straw men about hardware support for the feature.
To me the only valid argument I have seen so far is the one that says that
there are
better uses for the tag bits. This I can believe. Show me cycle counts
that demon-
strate that CDR-coding would add time to the "straight-line" path for list
traversal and
I will believe that CDR-coding is still a bad idea. Arguments that simply
assert that
arrays are "generally" better than lists are not convincing. Lists AND
arrays have been
around since the advent of computers. They are both useful under different
circumstances
and I don't think that you've given enough evidence to show that an array is
the best choice
for the default sequence type for a language. Certainly, it seems to be the
"most natural"
choice for Fortran, C, and Java language users, given that construction,
traversal, and
manipulation of lists is clunky in those languages (Strangely enough C++
with the STL
and operator overloading sidesteps this objection to a certain extent, but I
attribute that
to luck more than any conscious design effort) but that's hardly a
convincing argument
overall.
faa
> Excuse me? Yes, it makes sense to read Lisp data that the application
> does not know the structure of a priori. For one thing, source _code_
> matches that description.
Your point seems to be that it is a bad idea to use array to represent
complex nested structure, such as Lisp-style source code. I agree
this arrays are at a disadvantage here, since Lisp source code uses
many variable-sized and short sequences. However, the disadvantage is
minor (at most a small constant factor), and it doesn't really matter,
since source code is relatively small and only kept around during input
and compilation. It is more interesting what happens with large files
and large sequences. That is where arrays win and (I think) lists
have the disadvantage.
> Yes, [the application] knows they will all be Lisp objects.
> That's it. This is part of the significant charm with Lisp's lists.
An application has to know more. Presumably it intends to *do*
something with those Lisp objects.
> There's something wrong with your math. The average number of times
> an element is copied is 1.0 if you have (- (1+ (expt 2 n)) m) elements
> in the final array with m elements in the initial allocation and
> approaches 2.0 asymptotically for (1+ (expt 2 n)) elements in the
> final array, since adding the last element will copy everything and
> end up with half the allocated array empty (short that one element).
I believe your're right.
> | My argument is a little of all of:
> |
> | (1) I think high-level programming languages should ...
> | (2) I think programmers should ...
> | (3) As a consequence of (1) and (2), ...
>
> I'd be interested in how you _arrived_ at your preferences, instead of
> just having something follow from some random personal favorites.
First, I've tried to argue that arrays are a better general-purpose
data structure than lists: You can do things (indexing) that you can't
to efficiently with lists. Sequential traversal (including merging
and appending to the end) is perhaps most common use of sequences, and
boths arrays and lists handle that well, but arrays make it easy
to traverse from either end. Arrays have better locality of reference,
and they take less space (in general - short arrays and extensible
arrays just after resizing might lose out spacewise to lists, but not
by much). Lists have some advantages. Some of the advantages are
historic (i.e. Lisp is based on lists); other may be inherent. For
example some algorithms (tree-sort? - merge-sort? I forget the name)
need linked lists. Other applications may be more convenient to
code using lists, especially in languages like Lisp which historically
are oriented towards Lisp. So, generally arrays make a lots of sense,
and if you are going to represent a sequences of values in your
program, in general arrays are more likely to be the better answer
than lists. Now if you are going to be using Lisp, this may be a
little different, just because of Lisp's support for lists. But
from an abstract data-structure point of view (and note this is comp.arch,
not just comp.lang.lisp), arrays usually are to be preferred over lists.
Secondly, if I was designing a programming langauge from scratch, I
would use arrays rather than lists as the fundamental data type for
implementing sequences, partly for the reasons above. The other
reason is that I like languages that work on collections as a unit,
and provide operations than work on collections (sequences, sets,
relations, trees). Such languages can be very powerful and compact.
I also think they may be easier to use than languages that encourage
recursion to work on lists. I like side-effect free languages, or at
least languages that encourage a side-effect free style. In such a
language, the benefit of using lists instead of arrays is that it is
easy to use recursion to define your algorithms. But using recursion to
scan through a list is, I think, a bad idea, because it is harder to
understand than iteration. Recursion should be used for recursive
data structures. But defining a sequence as a recursive data
structure (a chain of conses) is wrong - you're creating recursion
where you don't need it.
> This is where you're losing me. I don't see a huge need for making a
> basic type more efficient in languages that _don't_ show a predominant
> use of that basic type.
Neither do I. My point was the different trade-off of Java and Lisp.
>. And why are we discussing Java? I thought it
> was already as array-friendly as you wanted languages to be?
Absolutely not. But that is a different discussion.
> And Lisp
> already has adjustable arrays, so what's the point in redesigning them?
I'm not. You're the one who insists on adding a displacement pointer
to all adjustable arrays.
> Sigh. Why am I wasting any time on this? The displacement is needed
> for the "rest" (cdr) function that people who work on lists, however
> they are actually implemented, need very often.
I guess the problem is we're discussing two different things here:
Arrays vs lists in abstract, and how they are used in Lisp. When
I work on arrays, I don't find the need for a rest function a lot.
If you need the rest/cdr function a lot, then perhaps you shouldn't
be using arrays. Or re-think your algorithms.
> Yes, but that is a truly boneheaded decision as it means that the data
> array can no longer be specialized to contain a uniform data type --
Of course it can. This is what Java primitive arrays are: A non-adjustable
array of some primtive type (for exampe double-floats). These are
comonly implemented as an object header followed by a length
followed by the data (words, bytes, or whatever).
> This is just stupid.
There are all sorts of tradeoffs, in memory management especially.
> | (It would be needed there anyway if memory management code needs to
> | scan memory consequtively.)
>
> Really? I thought memory management code followed pointers and chains
> from a root set if it were any smart.
Well, for example a compacting phase might be easier if you could
scan memory in order.
> Randomly walking over memory it has no idea what contains
If you have the appropriate header words, you would know what the memory
contains.
> | That gives only 2 words for the header meta-object.
>
> And much more overhead and increased complexity and general madness.
> You have obviously never actually experimented with this in _Lisp_-
> like languages.
Of course I have. However, I have done most of my work in the context
of the Java VM, which has other tradeoffs than if you can custom-design
your own memory-management. There are other Lisp implementations
that use the Boehm conservative collector (or similar), perhaps
because interoperability with C is desired, or because they would
rather used an existing well-engineered GC instead of writing their own.
> Lists can grow at the tail end without any "header cons" simply
> because we just overwrite the cdr that is nil in the final cons
> with a pointer to the cons cell that holds the new element (and a
> new nil).
Not if the list can start out empty.
> | As I said: Only twice total, on average.
>
> Well, you actually said *a single time*, but if you keep doubling it
> every time you say it, I'm happy. (Sorry, but I found that fuuny. :)
Twice referred to the number of memory accesses to write the array
(move once on average plus initialization). I'll bump that up to thrice,
if you like.
> I'm confused about your purposes, I guess. Promote APL if you like,
> but then I don't see why you're even offering suggestions for how to
> implement lists efficiently in Lisp. This is mystifying to me.
I am not talking about how to implement lists efficiently in Lisp, if
by "lists" you mean the "list" type of Common Lisp, except that I
don't think there is much point in implementing cdr-coding, at least
in hardware. What I am talking about how to implement sequences or
lists in the more generic sense.
>
> Your point seems to be that it is a bad idea to use array to represent
> complex nested structure, such as Lisp-style source code. I agree
> this arrays are at a disadvantage here, since Lisp source code uses
> many variable-sized and short sequences. However, the disadvantage is
> minor (at most a small constant factor), and it doesn't really matter,
> since source code is relatively small and only kept around during input
> and compilation. It is more interesting what happens with large files
> and large sequences. That is where arrays win and (I think) lists
> have the disadvantage.
I think this paragraph sums up what is wrong with your argument for
Lisp. You seem to be assuming that people typicially use huge,
shallowly-nested lists which would `obviously' be better represented
in some array form. Well, I just don't think that's true for
well-written Lisp programs. I don't have general evidence, but in
programs I've written I've done some measurement of this kind of
thing, because I was concerned about things like search times, and I
found mean lengths of about 6 elements and mean nesting depths about
the same.
I suspect this isn't the case for Java, but Java programs perhaps
don't deal with the same kind of data that Lisp ones do.
--tim
What's wrong with STL?
If you have to use C++ anyway, STL has a lot of very useful ideas.
> My argument is a little of all of:
>
> (1) I think high-level programming languages should be built around
> array support rather list support. (Actually, I prefer language
> primitives to work on generalized sequences, with arrays just being
> the default implementation.)
I.e. STL <vector> etc.
Terje
--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"
Well, OK. cdr-coding is all about efficient implementation of lists in
Lisp, and the reason I insist on that displacement offset that you reject
out of hand even with the explanation is that the context of the whole
thread has been how to store lists more efficiently, while obviously
keeping all the listness qualities. I'm sorry you have missed this.
I don't have a problem with your desire to use arrays. I use them all
the time myself, and so do other Lispers. I don't see a conflict, and I
don't see a need to optimize Lisp any more towards arrays, as the support
for the abstract type "sequence" already there, too.
It seems to me that you think "Lisp" is the Scheme branch of things,
which I barely think is a Lisp at all -- I think Scheme is an Algol with
Lispy syntax and some functional properties. There are many reasons I
loath Scheme, but one of them is certainly that it exposes the choice of
underlying types so much in the interest of giving you that Algol feel of
efficiency and typeness.
> You're probably right, but a good argument against it is that USENET
> actually works in a responsive and reliable manner. WWW seems always
> to be too slow and produce too many 40x's.
One of the reasons for that is that not all news servers get all of
Usenet. Thank goodness.
Another argument against changing it to fetch-by-demand is that the
way things are feeded now, it works like crude pre-caching.
(Fetch-by-demand can be implemented in a smart way, too, of course,
lessening the problem, and even combined with pre-caching. Isn't that
nice.)
--
In the beginning was the Bit, and the Bit was Zero. Then Someone
said, Let there be One, and there was One. And Someone blessed them,
and Someone said unto them, Be fruitful, and multiply, and replenish
the Word and subdue it: and have dominion over every thing that is.
> Erik Naggum <er...@naggum.net> writes:
>> I expect that only some headers will be passed around very shortly,
>> with really clever cache propagation protocols that makes clients
>> retrieve an article by message-id on demand, ultimately from the
>> originating server, cutting traffic and disk requirements down to what
>> is actually used, and killing half the stupid spam problem, too.
This is a bit too crude, unless it's your local server doing the
fetching for you, not the client (user agent). But that may be what
you mean with "clever cache propagation".
However, the originating/injecting server must have the bandwidth or
other capacity for dealing with external requests for articles or even
storing articles for as long as necessary (that is, until some
reasonable amount of time has passed). If those requirements aren't
met, the current model seems much better to me.
Considering that there are great variances in how long articles are
stored from news provider to news provider, it seems likely that there
is a significant amount of users who want to read older articles. It
isn't unreasonable to assume there will be a good amount of arbitrary
requests for older articles at the originating server, say up to a
month after the article was posted. Someone with a large user/poster
base will have to upgrade their injecting servers. :)
Another issue is that if the injecting server is somewhere remote in
Australia and your client is in Norway, response will be slow,
reducing the usefulness of Usenet compared to the web.
Ketil Z Malde has a point when he talks about the responsiveness of
today's Usenet; it's very important for the user that the articles
requested appear "immediately". (There hasn't been much research on
Usenet, but I believe it's safe to apply relevant aspects of usability
studies of the web.)
> I've been hoping for such a thing for about 5 years now, but is there
> any work goin gon...?
From what I understand, it isn't uncommon to deal with header-only
feeds, and Diablo supports fetching Message-IDs from other servers by
demand (automatic redirecting to the other news server). The latter
seemed to work well when I tested it when the news servers in the
chain were topologically close. I didn't test with servers on the
other side of the globe, though.
Good call, that man. Thanks for putting what I meant into clearer terms.
This is not an issue for architecture designers anymore, because
you can use paging to remap the addresses the way you wish. This
will only work with mid to top bits of course.
You can take linux and do the remapping the way you wish if it's
really that important.
Miha
If I remember correctly, back around 1980 or so, if you asked an IBM
engineer why they "needed" to move past 24-bit addressing to 31-bit, the
reason given was unrelated to application programming. That is, noone
needed to write programs which required that much memory, except for one
very special group of programmers.....the programmers who wrote IBM's
database management systems and operating systems.
Along with the big, new address spaces (actually just before them) came
additions to the instruction-set which permitted and protected the ability
to switch virtual storage context from one virtual address-space to
another (note that this is orthogonal to the ability to switch from
"kernel-mode" to "user-mode"). In practical terms, this permitted the OS
itself to spread ITS OWN data-structures across multiple address spaces in
order to circumvent the 16MB architectural limit. DBMS's did the same
thing.
The design and performance "win" in both cases was to permit the OS's
paging routines and the firmware's virtual-storage enhancements, which
were highly-optimized, to manage the buffering of data and data-structures
in real-memory. The "applications", ie. the bulk of the OS and DBMS, were
thereby relieved of these duties: coding and design were simplified.
I should mention that due to previous architecture/design choices, not all
of the older 16MB address space was available to the OS and/or DBMS
(and/or application) for exclusive, private use. Thus the effective
available virtual storage was less than 16MB, sometimes considerably less,
dependent on local OS changes as well. Even more reason to enlarge it and
to permit access to multiple virtual address spaces.
Any IBM folks want to chime-in here? Are all of the System/370 hands
extinct already? I haven't touched a 'frame in 9 years (don't miss
them either :-))
* Nick Geovanis The data were not anything that would lead anybody
| IT Computing Svcs to conclude the Fed would do anything different
| Northwestern Univ than what they already believed they were likely
| n-geo...@nwu.edu to do. - Chief Stock Market Analyst, 12/14/2000
+-------------------> Cantor Fitzgerald and Co.
> I should mention that due to previous architecture/design choices, not all
> of the older 16MB address space was available to the OS and/or DBMS
> (and/or application) for exclusive, private use. Thus the effective
> available virtual storage was less than 16MB, sometimes considerably less,
> dependent on local OS changes as well. Even more reason to enlarge it and
> to permit access to multiple virtual address spaces.
>
> Any IBM folks want to chime-in here? Are all of the System/370 hands
> extinct already? I haven't touched a 'frame in 9 years (don't miss
> them either :-))
a major MVS issue (in 24bit addressing land) was that it had inherited
from SVS, MVT, PCP (back to early '60s), etc a paradigm/design where
the supervisor occupied the same address space as the application
programs. This was slightly mitigated in the SVS->MVS (in the early
'70s) where the restriction that all applications programs and all
supervisor functions occupy the same 24-bit address space was slightly
lifted (single virtual storage ... became multiple virtual storage ...
where there was a 16mbyte address space for each application ... with
the MVS supervisor and various subsystem components residing in each
one).
However, by the late-70s having all supervisor functions in the same
address space as the application along with various supervisor
subsystem requirements were again starting to serverely strees the
24bit address limit.
while some of the MVS gurus might have believed that they were the
biggest address space hogs in the world, some MVS installations were
having difficulty leaving even 6mbytes (of the 16mbytes) available to
application program. There were actual applications in the '70s that
demonstrated large address space appetites. Some of these were large
database transaction subsystems that had to exist in the tiny space
left in the 16mbytes space after the MVS supervisor and subsystem
requirements were met.
In the initial port of apl/360 to cms/apl ... the apl workspace
limited was opened up from typically 32k-64k bytes to just under
16mbytes. There were a number of applications that actually took
advanage of the increased workspace size.
One of those were the HONE service. This was the service in the '70s
and '80s that supported world-wide sales, marketing, hdqtrs, and field
support operations. One example, starting sometime in the mid '70s,
IBM mainframe orders became so complex that it couldn't be done
manually, a HONE application was needed to fill-in the order. Another
big use of HONE was for economic planning & modeling ... much of the
"what-if" processes done today on PC spreadsheets were performed in
APL.
In '77, the US HONE operations were consolidated in a single location
in california with, what was at the time believed to be the largest
single-system image operation in the world (cluster of SMP processors
sharing large disk farm). In '78/'79, the single-system image was
replicated in dallas and boulder providing disaster survivability
support (in case of national disaster, like an earthquake in
cal.). This was in addition to various HONE clones that resided in
numerous countries around the world.
Almost the entire HONE "experience" was delivered first on cms/apl and
then later on apl/cms.
random refs:
http://www.garlic.com/~lynn/2000f.html#62
http://www.garlic.com/~lynn/2000f.html#30
http://www.garlic.com/~lynn/2000.html#75
http://www.garlic.com/~lynn/99.html#112
--
Anne & Lynn Wheeler | ly...@garlic.com - http://www.garlic.com/~lynn/
I consider today's USENET distribution a crude cache propagation
mechanism and propose something that uses much less bandwidth and disk
space while it maintains the principle of nearby caches. A few years
ago, for instance, the five largest USENET sites in Oslo, Norway, were
all located within a square mile, duplicating each other with great
precision and speed because they had different user bases and owners.
The extant news software could not propagate news articles by any
other means than flooding every server with everything, but if the
news protocols were only slightly smarter, they could have waited
until they needed the article body before they requested it, if they
had the headers that they could show to users. Users would not have
noticed those delays, and if there were any millisecond delays, it
would be only for the first reader that day/week/whatever. Now, scale
this up and consider large (network-topological) regions cooperating
to avoid duplicating news articles and traffic needlessly. How many
USENET servers are there around the core interchange points these
days? Distributed, redundant load balancing is not exactly news to
people who care about distributed systems, but we still have people
who worry needlessly if they cannot have their local USENET feed. If
you get upset because you cannot read news because a remote server is
down, you are a USENET addict and need treatment, not better servers.
The problem with the whole current design is, however, that you do not
know who the originator (initial injector) and owner is and you cannot
request the article except from another nearby cache, so if you have a
mesasge-id that some article is a response to, you are basically hosed
if you have not been so lucky as to have it flood by you. That does
not mean there _is_ no originator and owner, so I consider it useful
to think of the injecting user as _virtually_ reachable.
| However, the originating/injecting server must have the bandwidth or
| other capacity for dealing with external requests for articles or even
| storing articles for as long as necessary (that is, until some
| reasonable amount of time has passed). If those requirements aren't
| met, the current model seems much better to me.
The current in-flow rate is a problem. I have never heard of any
USENET site that has out-flow problems for articles originating at
their site. Keeping your "own" articles around for months should be
the smallest issue compared to keeping everybody else's articles
around for weeks and days. You can increase your out-flow a hundred-
fold and still be _far_ short of the current in-flow, and this makes
it possible for leaf sites to have (relatively speaking) _very_ small
caches while their preferred nearest cache (akin to the site they get
their feed from today) holds on to articles as long as one of their
many leaf sites request it. The great thing with a design like this
is that you can upgrade from the leaf sites "inward" to the core
servers, because as long as there are major core servers who still
operate in the "old way", there will be much less pressure on the
injecting site. Given how slowly changes occur in the USENET core
technology, the chances are very, very good that there will remain a
number of huge servers who can and will act as proxies for the many
injecting sites that will never see any significant network load.
| Considering that there are great variances in how long articles are
| stored from news provider to news provider, it seems likely that there
| is a significant amount of users who want to read older articles.
Yes, that's the idea, to keep the original article available longer.
| It isn't unreasonable to assume there will be a good amount of
| arbitrary requests for older articles at the originating server, say
| up to a month after the article was posted. Someone with a large
| user/poster base will have to upgrade their injecting servers. :)
Methinks you have lost all sense of proportion and need to back up and
look at the numbers you give for the current situation and its growth
and consider who the people are who contribute the volume of data that
you refer to. Yes, there are some large sites, perhaps responsible
for as much as 1/1000th of the total volume on USENET each, but they
_already_ have 1000 times the capacity to handle "injections" and if
you ask them a 100 times for every article they have published, they
are still at 1/10th their old bandwidth, and they aren't going to fill
the remaining 9/10th with requests from other servers, either.
| Another issue is that if the injecting server is somewhere remote in
| Australia and your client is in Norway, response will be slow,
| reducing the usefulness of Usenet compared to the web.
Really? How come I can pick up the article from any number of very
nearby caches today? Hmmm. Mystifying! How _did_ they get there?
| Ketil Z Malde has a point when he talks about the responsiveness of
| today's Usenet; it's very important for the user that the articles
| requested appear "immediately". (There hasn't been much research on
| Usenet, but I believe it's safe to apply relevant aspects of usability
| studies of the web.)
Yeah, I think you guys have got it _exactly_ right. Of course I'm out
to destroy USENET and its usability when I suggest that we invent a
better way to propagate articles. Of course I'm trying my very best
to screw with the minds of people who implement the software so _they_
also think "Hey, this USENET thing really was a bad idea to begin
with, so let's just do something utterly braindamaged that will kill
it and hopefully everybody using it, too". Get a _grip_, guys!
If you don't understand cache propagation mechanisms, say so. If you
cannot even _imagine_ that somebody out there have tried to think
about the way USENET propagates _while_ trying to keep it working for
its users, I suggest you try to insult people openly instead of just
assuming they have extra spicy garlic meatballs for brains. Sheesh.
Today's USENET propagation model does not try to keep track of where
articles are read, that is the task of local news admins, most of whom
take some pride in providing "everything" to their users. If we did
not ship the articles until they were read, only enough header stuff
that people could see newsgroups listings and stuff, the traffic would
change patterns to adapt to the readership instead of the global flood
algorithm used today. This would cause an increased ability to read
"fringe" newsgroups (it's always a hassle to get a news admin to get
another weird newsgroup hierarchy, because only one group is too much
work and the whole hierarchy is too much data), with actual _user_
input to the distribution of news articles.
| From what I understand, it isn't uncommon to deal with header-only
| feeds, and Diablo supports fetching Message-IDs from other servers by
| demand (automatic redirecting to the other news server). The latter
| seemed to work well when I tested it when the news servers in the
| chain were topologically close. I didn't test with servers on the
| other side of the globe, though.
I'm aware of Diablo and strongly encourage further development along
those lines, but as the network of servers picks up messages, they
will naturally be cached many places along the way, not very much
different from today's system. Having to follow a chain of forwarding
servers to get a particular article is therefore very unlikely, unless
you read "fringe" newsgroups that nobody else in your vicinity reads.
When you do that, you might also well tolerate longer access times.
> What I asked was given the mismatch of cache speed and CPU speed,
> would a software implementation of CDR-coding be a good
> implementation technique.
This is a reasonable question. CDR-coding could conceivably be a
useful implementation techique in certain environments optimized for
large Lisp code bases. That assumes the effort to implement it is
reasonable; since I'm not convinced you could not get a better payoff
by using arrays in place of lists, at least in those place where
cdr-coding would make difference. But that might require re-writing
working code, which I can see people would rather not do!
> [List and arrays] are both useful under different circumstances
Agreed.
> and I don't think that you've given enough evidence to show that an
> array is the best choice for the default sequence type for a language.
Well, reasonable people can differ on this one.
> Certainly, it seems to be the "most natural" choice for Fortran, C,
> and Java language users, given that construction, traversal, and
> manipulation of lists is clunky in those languages
Using Lisp-style lists is quite straight-forward in Java. What it lacks
is a standard pre-defined list class based on conses. Using lists in
languages that lack a garbage collector is likely to be more tedious.
> You seem to be assuming that people typicially use huge,
> shallowly-nested lists which would `obviously' be better represented
> in some array form. Well, I just don't think that's true for
> well-written Lisp programs. I don't have general evidence, but in
> programs I've written I've done some measurement of this kind of
> thing, because I was concerned about things like search times, and I
> found mean lengths of about 6 elements and mean nesting depths about
> the same.
I think this actually supports my point, since arrays of length 6
are more space-efficient than lists of length 6. (With some quibbles
about implementation techniques for adjustable vs non-adjustable arrays.)
I agree it wouldn't make that much difference either way for short lists.
One might also use data structures differently in an array language.
The classic example is a matrix, which is more efficiently represented
using a single vector in row-major order, rather than a nested vector
of vectors. Another example is using two arrays to represent a tree.
> I suspect this isn't the case for Java, but Java programs perhaps
> don't deal with the same kind of data that Lisp ones do.
Probably not - and good Java programmers would be likely to structure
a solution differently than good Lisp programmers would.
many 3033 shops in late '70s (basically a souped up 370/168) started
seeing these real-storage constraint problems.
the issue was how to address the problem.
CCW (i.e. i/o transfer) already supported 31bit real address with
IDALs.
To get some slight relief, the 3033 introduced a hack for getting more
than 16mbyte real storage. The 370 pagetable entry is 16bits, 12bits
for specifying real page number (when combined with 12bit, 4k pages,
yields 24bit addressing), an invalid bit, a protection bit, and two
unused bits ... something like
NNNNNNNNNNNNPxxI
where "xx" are the unused/undefined bits. The 3033 hack was to use the
two "xx" bits and prefix them to the 12bit page number to allow
addressing up to 2**14 real pages ... or 2**26, 64mbytes of real
storage. Executable code was still limited to 24bit virtual addresses
but it was possible to allocate virtual pages in real storage above
the 24bit line by setting the appropriate bits in the page table
entry. And of course, the standard 370 CCW IDALs already had 31bits
available for addressing real storage in i/o operations.
cross-memory services was also introduced with the 3033. in an attempt
to help get some of the supervisor subsystem functions out of the same
address space as the application (at least get things to the point
where maybe a whole virtual 8mbytes was available to applications)
... and not to have a significant downside impact on the MVS
"address-pointer" parameter paradigm, these supervisor subsystem
functions had to reside in their own address space while still
directly supporting services requiring addressing of the application
virtual address space. cross-memory services introduced new addressing
modes allowing instructions to address virtual storage different than
the virtual address space that they were executing in.
random refs:
http://www.garlic.com/~lynn/99.html#99
http://www.garlic.com/~lynn/99.html#7
http://www.garlic.com/~lynn/2000e.html#57
http://www.garlic.com/~lynn/2000g.html#11
oops finger slip, that should have been:
http://www.garlic.com/~lynn/99.html#190
giving the 3033 announce & ship dates
You might be interested in NIAL (Nested Interactive Array Language)
--
Paul W. DeMone The 801 experiment SPARCed an ARMs race of EPIC
Kanata, Ontario proportions to put more PRECISION and POWER into
dem...@mosaid.com architectures with MIPSed results but ALPHA's well
pde...@igs.net that ends well.