> Duane Rettig <du...@franz.com> writes: > > 64-bit architectures tend to define their > > entire virtual space. For example, the PA-RISC with the W bit set defines > > text segments starting at 0x4000000000000000, and data segments starting at > > 0x8000000000000000. Which bits would you steal? Some bits in the middle?
> Sure. The allocator needs to know about it, but as long as there's > plenty of virtual space, it's not that hard to do. If there isn't -- > and I do believe it would be rash to assume somebody won't find a use > for all that space -- you need another version of your system with no > cdr-coding. It's wasteful not to put all that virtual space to > creative use, if the application doesn't need it all.
Right, but at Franz, we are not making such assumptions of non-use; this is precisely the kind of assumption leading to a rewrite that I think we should try to avoid.
> > > >Note that one such test that becomes nontrivial on General Purpose > > > >hardware (i.e. not lisp machines) is EQ. Instead of one instruction, > > > >those extra bits must be masked off before the comparison. I don't > > > >know if anyone has ever considered placing those extra bits "elsewhere", > > > >e.g. in a parallel location to the actual pointer/tag word.
> I can't figure out what kind of an implementation you envision that > would have a special problem with EQ. I would place the extra bits in > the CAR cell to code the representation of the CDR, and mask them out > when reading the CAR; if you don't mask them out then, there's lots of > other operations that would have to know about them as well, not just > EQ.
Yes, exactly! Without cdr-coding, EQ objects have identical tagged pointer values (which I call LispVals) and thus no masking is necessary at all to compare identity. And my example was by no means exhaustive, but was instead only the simplest example.
-- Duane Rettig Franz Inc. http://www.franz.com/ (www) 1995 University Ave Suite 275 Berkeley, CA 94704 Phone: (510) 548-3600; FAX: (510) 548-8253 du...@Franz.COM (internet)
>> Meanwhile, there's another limit that might be harder to deal with: >> 250 GB/day is about 23 megabits/sec average, i.e. half a t-3 for the >> news backbone. The expanded version of usenet will require about >> 80 exabits/sec for the backbone;
>Eh, no, 811 exabytes/day is not 80 exabits/sec, but roughtly 80 >petabits/sec, which is much easier to handle. ;)
Oops, got my prefixes swapped.
>> to give you a rough idea of what >> that means, using straightforward modulation techniques that would >> be about 10 signal transitions per cycle of yellow light.
>Does this mean that it's only (roughly) 0.01 signal transitions per >cycle of yellow light, or is that what you get for petabits?
The latter. Exponent was right, name was wrong.
>But still, the time for this problem to arise might still come. >Usenet news isn't the only thing in the world requiring larger amounts >of storage and bandwidth. Video-on-demand has (so far) largely been a >failure for the mass market because of the lack of bandwidth for the >necessary quality (although MPEG-4 and later MPEG-7 may make subtle >changes to this), and sometime, people may actually have those nice >HDTV (or better) displays in their homes. So yes, I think we'll reach >that limit one day. Maybe it won't be in 20 years, but I'll be around >when it happens.
I don't think so, mostly because I don't think that the notion of "backbone" will make any sense at that point. There still will be places where enormous amount of data get shipped around, but the idea of big, fat, long pipes connecting a bunch of narrow ends seems less likely.
Btw, video-on-demand requires less bandwidth into each house than does regular cable TV. It's the aggregate bandwidth but more important the decentralization that gets you.
Erik Naggum <e...@naggum.net> writes: > Really? Reading Lisp data from input streams is well-defined today,
Perhaps, but does it make sense? Presumably you application has some idea of the structure of the data it is expecting? (Even trees have structure.)
> but arrays are very inefficient because you don't know their size > a priori, and copying elements like crazy as the list grows is > much less efficient than grabbing cons cells from a pool of them.
But that is just my argument - it is *not* much less efficient. If you double the array as it grows, the "average" element is only copied *a single time*. (Some elements may be copied n*log(n) times, but at least half the elements have been copied at most once, at least three-quarter of the elements have been copied at most twice, etc.)
> The way Lisp readers usually deal with arrays is in fact to build a > list from the input and then convert the list to an array.
A reasonable thing to do, at least if you have a good GC.
> You recommend arrays to other people, not just to yourself, unless I'm > _gravely_ mistaken about the purpose of your article. Other people > who have much less experience than you and rely on your experience in > order to form their own, with the least possible pain. My experience > with people who try to implement things with arrays is that they end > up with very cons-like concepts once they outgrow the C "memory is > really an array of bytes" idea. The reason is simple: Arrays with the > properties we discuss are very, very hard to implement correctly.
It is not that hard. Still, I agree it is is best if arrays are well supported by the programming language or at least a good library.
> If we _are_ talking about what a compiler designer would do for a > language that natively supports these things in such a way that > programmers would never _see_ the array, the kind of reasoning you > have provided is completely misplaced. I have therefore assumed that > you do _not_ argue about primarily what a compiler writer would do, > although CDR-coding _is_ at that level.
My argument is a little of all of:
(1) I think high-level programming languages should be built around array support rather list support. (Actually, I prefer language primitives to work on generalized sequences, with arrays just being the default implementation.)
(2) I think programmers should (in general) use arrays rather than lists, even in Lisp-like lanuages, at least when the number of elements is large and performance is important.
(3) As a consequence of (1) and (2), I don't see a lot of justification for cdr-coding, at least as something one would do in hardware.
> I think you need to explain how you ended up with 9 words for the > list. There is of course no header word for cons cells if you are > going to implement a very basic idea in a language with this.
I discussed this case in the following paragraph.
> Type information is stored much more efficiently if you care about > these things, or you allocate them in blocks that only hold cons > cells. That is, arrays of cons cells, if you want. Likewise, > your arrays would have type information in a similarly more > efficient way.
Agreed, at least for a Lisp-like language. For a object-oriented language like Java, with many different types where fewer of them predominate, it may make more sense to always have a header word. (Of course CL is object-oriented. However, it differs from Java in that certain pre-defined types are used very heavily so it makes sense to optimize for those types in a way that makes less sense for Java.)
> Here's how I count. First, the data array. It has one word per > element, but it is aligned at and always allocate in some useful > chunk, like 4 words. This is to make allocation significantly > cheaper. Since the data array may need to be moved when grown, there > is a meta-object that contains the allocation size, the displacement > offset, the active size, and a pointer to the data array, or 4 words, > one allocation unit.
The displacement offset is not needed for adjustable arrays. It is only needed for displaced arrays. And you can place the allocation size at the start of the data array. (It would be needed there anyway if memory management code needs to scan memory consequtively.) That gives only 2 words for the header meta-object.
> | If you use a memory allocation scheme not requiring any header words, > | then the extensible 3-element array takes 6 words, the non-extensible > | array takes 4 words, and the list version takes 6 words. The array > | is equal to the list in space.
> Yes, but only at enormous expense in allocation costs with such a fine > granularity. Also, you really do not want your array to move around > if it grows, or you lose the idea of identity of lists.
But the array object doesn't move if you use a header meta-word, only the data array moves.
Of course if you want to preserve identity of lists for linked links, you will also need an initial header cons.
> The overhead is twice as much _after_ the overhead of arrays have been > dealt with, including that copying-when-doubling scheme, which out of > necessity must allocate memory for the same element many times over as > the list grows.
As I said: Only twice total, on average.
> By cleverly allocating allocation units for the data > array from a single pool that is not used for anything else, the same > way cons cells are allocated, you can grow your list painlessly and > linearly without copying until you allocate another array.
No such tricks need to painlessly and *linearly* grow the array, even *with* copying. Not to say that your scheme is a bad one - I agree it a very reasonable thing to do.
> The interesting thing is, however, what the languages look like after > you decide to represent lists as arrays that way. E.g., cdr-ing down > a list can easily be emulated with displaced arrays, i.e., arrays that > really use another array's data for its elements, but starting from an > offset for elemen t0.
Agreed. However, my preference is for languages where the programmer does not usually explicitly "cdr" down a list - or even index them. I think languages should encourage programmers to work with arrays as a unit, and language primitive should encourage that. I'm in favor of "array languages" or what we might loosely call APL-style languages.
In the context of Common Lisp, I prefer the "series" design rather than the "loop" design. -- --Per Bothner p...@bothner.com http://www.bothner.com/~per/
In article <joswig-54D876.13154213122...@news.is-europe.net>, Rainer Joswig <jos...@corporate-world.lisp.de> wrote:
> In article <917vho$...@web.nmti.com>, pe...@abbnm.com (Peter da Silva) > wrote: > > In article <m24s09hy46....@kelso.bothner.com>, > > Per Bothner <p...@bothner.com> wrote: > > > Note the context of this discussion: Is cdr-coding a good idea? > > CDR-coding is an implementation technique for hiding an array structure > > behind a list structure. > CDR-coding is an implementation technique for optimizing > the space usage and the locality of lists.
I see these two statements as being equivalent in the context of this discussion.
-- `-_-' In hoc signo hack, Peter da Silva. 'U` "Milloin halasit viimeksi suttasi?"
> In article <joswig-54D876.13154213122...@news.is-europe.net>, > Rainer Joswig <jos...@corporate-world.lisp.de> wrote: > > In article <917vho$...@web.nmti.com>, pe...@abbnm.com (Peter da Silva) > > wrote: > > > In article <m24s09hy46....@kelso.bothner.com>, > > > Per Bothner <p...@bothner.com> wrote: > > > > Note the context of this discussion: Is cdr-coding a good idea?
> > > CDR-coding is an implementation technique for hiding an array structure > > > behind a list structure.
> > CDR-coding is an implementation technique for optimizing > > the space usage and the locality of lists.
> I see these two statements as being equivalent in the context of this > discussion.
Symbolics was concerned with space usage. You don't get constant time access of arrays. Accessing the nth element of a cdr-coded list is not equivalent in terms of speed to accessing the nth element of an array.
>> In article <4aea2iwyp....@beta.franz.com>, Duane Rettig >> <du...@franz.com> wrote: >chu...@best.com (Chuck Fry) writes: > >> >> And with 64 bits to play with, who's to say we can't spare a >> couple for >> CDR-coding? > >Let's not make this mistake >> again. When IBM (and Amdahl) created their XA >architectures >> for the 370 compatibles (extending the address range from 24 >> >bits to 32 bits), several programs suddenly broke. >> Can anyone come up with a plausible program in the next two >> decades, to run on conventional architectures, that will require an >> address space between 60 and 64 bits? If you say "let's map >> everything into a flat space" you will need more than 64 bits. If >> you stick to conventional addressing, including memory mapping >> individual objects, you use a lot less than 64 bits.
Paul> Take a plastic ball and cover the bottom half with microwave Paul> detecters and A/Ds. Hand wave that some horrid reasearcher will Paul> come up with a way to get 8 bits per sample, even at, say, Paul> 500GHz. Hand wave that the packages is about 5cm. So that is Paul> about 200 per ball, with out getting too thing about cramming Paul> them all in. So that comes to 1TB per sec.
Paul> That per 'ball'. You have a square Km of them, plus extras. Say Paul> a spacing of 2M, thats 500x500, 250,000. So that comes to 250 Paul> EB/sec... Now run it 24 hrs/days for a decade... Or, just take Paul> 10 hrs data for analysis. 64 bits does not go far here, just in Paul> data size.
Paul> Plus, the accounting software may need to be 64+ bits as Paul> well :)
The jump from 16 bit architectures to 32 was a _pretty_ big deal, as the typical "dynamic scope" of 32 bits, +/-2^31-1, is a whole lot bigger than +/-32767, and suffices to characterize a lot more problems.
Jumping from there to 64 bits is a _whopping_ big jump; while 32 bits isn't quite enough to represent national debts, it's not _far_ off, and whether we speak of 36 bits, 48, 60, or 64, it's all "quite large enough" to represent values that required compound data structures with only 32 bits to play with.
Long and short being: "No, the accounting software probably does _not_ need 64+ bits; they'll probably revalue the currency before that becomes relevant..."
There will indeed always be some "bleeding edge" applications that could use the maximum capabilities that any technology can ever offer; that doesn't mean that such technologies will be economically feasible, or that they will offer enough to "more mundane" applications to economically justify their development.
But where 8 bits sufficed for some applications, and having more is gravy, the same is true for successive generations of microprocessors, and 60 or 64 bits will "suffice" for many classes of applications regardless of whether we are thinking of numeric ranges or the sizes of memory addresses.
It should be eminently useful to take 64 bit architectures and waste a few bits on tagging; cutting ranges from 64 to 60 still leaves a useful range for _any_ application that has been deployable on a 32 bit architecture. -- (concatenate 'string "cbbrowne" "@hex.net") <http://www.ntlug.org/~cbbrowne/> Referring to undocumented private communications allows one to claim virtually anything: "we discussed this idea in our working group last year, and concluded that it was totally brain-damaged". -- from the Symbolics Guidelines for Sending Mail
> In article <joswig-54D876.13154213122...@news.is-europe.net>, > Rainer Joswig <jos...@corporate-world.lisp.de> wrote: > > In article <917vho$...@web.nmti.com>, pe...@abbnm.com (Peter da Silva) > > wrote: > > > In article <m24s09hy46....@kelso.bothner.com>, > > > Per Bothner <p...@bothner.com> wrote: > > > > Note the context of this discussion: Is cdr-coding a good idea?
> > > CDR-coding is an implementation technique for hiding an array structure > > > behind a list structure.
> > CDR-coding is an implementation technique for optimizing > > the space usage and the locality of lists.
> I see these two statements as being equivalent in the context of this > discussion.
The post points out that the discussion may be fundamentally flawed: lists and vectors are abstract data types, linked lists and arrays are implementations. Up until that post, the data type and implementation were being confused. -- Joseph J. Pfeiffer, Jr., Ph.D. Phone -- (505) 646-1605 Department of Computer Science FAX -- (505) 646-1002 New Mexico State University http://www.cs.nmsu.edu/~pfeiffer VL 2000 Homepage: http://www.cs.orst.edu/~burnett/vl2000/
* Per Bothner <p...@bothner.com> | Perhaps, but does it make sense?
Excuse me? Yes, it makes sense to read Lisp data that the application does not know the structure of a priori. For one thing, source _code_ matches that description. Data whose structure is only evident after having been read in as a complete expression also matches very well.
| Presumably you application has some idea of the structure of the data | it is expecting? (Even trees have structure.)
Yes, it knows they will all be Lisp objects. That's it. This is part of the significant charm with Lisp's lists.
| But that is just my argument - it is *not* much less efficient. If | you double the array as it grows, the "average" element is only copied | *a single time*. (Some elements may be copied n*log(n) times, but at | least half the elements have been copied at most once, at least | three-quarter of the elements have been copied at most twice, etc.)
There's something wrong with your math. The average number of times an element is copied is 1.0 if you have (- (1+ (expt 2 n)) m) elements in the final array with m elements in the initial allocation and approaches 2.0 asymptotically for (1+ (expt 2 n)) elements in the final array, since adding the last element will copy everything and end up with half the allocated array empty (short that one element).
| My argument is a little of all of: | | (1) I think high-level programming languages should ... | (2) I think programmers should ... | (3) As a consequence of (1) and (2), ...
I'd be interested in how you _arrived_ at your preferences, instead of just having something follow from some random personal favorites.
| (Of course CL is object-oriented. However, it differs from Java in | that certain pre-defined types are used very heavily so it makes sense | to optimize for those types in a way that makes less sense for Java.)
This is where you're losing me. I don't see a huge need for making a basic type more efficient in languages that _don't_ show a predominant use of that basic type. And why are we discussing Java? I thought it was already as array-friendly as you wanted languages to be? And Lisp already has adjustable arrays, so what's the point in redesigning them?
| The displacement offset is not needed for adjustable arrays. It is only | needed for displaced arrays.
Sigh. Why am I wasting any time on this? The displacement is needed for the "rest" (cdr) function that people who work on lists, however they are actually implemented, need very often. If you want to offer Lisp people something in terms of efficiency, you can't take away their primitives without changing the entire language under them. That's what I was trying to get at in the last parapraph I wrote.
| And you can place the allocation size at the start of the data array.
Yes, but that is a truly boneheaded decision as it means that the data array can no longer be specialized to contain a uniform data type -- say double-floats instead of pointers to double-floats (and you really want that if you go the way of arrays in the first place -- people even ask for specialized list types, and they won't stop if they know the underlying implementation is array-based), but must somehow have a special initial element or consist of a single word that destroys alignment of the rest of the array. This is just stupid. It's just like the old string representation that used the first byte to hold the length and couldn't hold more than 255 characters per string.
| (It would be needed there anyway if memory management code needs to | scan memory consequtively.)
Really? I thought memory management code followed pointers and chains from a root set if it were any smart. Randomly walking over memory it has no idea what contains just seems like a recipe for major disaster. Some conservative garbage collectors that are bolted onto languages that were never properly designed to deal with abandoned pointers seem to make do with interpreting machine words as pointers on a hunch, but I have always been exceedingly skeptical of the entire approach. If you keep the size of a vector in a well-known structure with the pointer to the data array, there is only one way to get at that data array, anyway.
| That gives only 2 words for the header meta-object.
And much more overhead and increased complexity and general madness. You have obviously never actually experimented with this in _Lisp_- like languages. I don't care about optimizing other languages for adjustable arrays, as we already have efficient adjustable arrays in Common Lisp, so there's no point in reinventing the wheel, and using adjustable arrays to represent lists requires concern for the usage of lists so optimized. I'm sure you can optimize this stuff to death in Java, but Java doesn't have any need for cdr-coding or more efficient storage of lists in particular, either, so I must admit to completely failing to see the whole point of this exercise, except that you get to argue for your preference for array-based languages, which I do not share, and am unlikely to begin to share just because of this.
| Of course if you want to preserve identity of lists for linked links, | you will also need an initial header cons.
I'm sorry, but is this a real argument? You're talking about data arrays that move when they grow at the tail end, right? Lists can grow at the tail end without any "header cons" simply because we just overwrite the cdr that is nil in the final cons with a pointer to the cons cell that holds the new element (and a new nil). Lists thus grown retain their first cons cell, which does not move, and thus retains identity at the pointer level. You do need a meta-object for the header of arrays because the data array moves the address of the first element when it is moved.
| As I said: Only twice total, on average.
Well, you actually said *a single time*, but if you keep doubling it every time you say it, I'm happy. (Sorry, but I found that fuuny. :)
| However, my preference is for languages where the programmer does not | usually explicitly "cdr" down a list - or even index them.
Well, I think it is preferable to ask people who prefer the languages they are trying to optimize for their ideas on how to optimize them instead of optimizing the programmers to use some language _they_ do not prefer simply because the optimizers do.
I'm confused about your purposes, I guess. Promote APL if you like, but then I don't see why you're even offering suggestions for how to implement lists efficiently in Lisp. This is mystifying to me.
#:Erik -- The United States of America, soon a Bush league world power. Yeee-haw!
> (2) I think programmers should (in general) use arrays rather than > lists, even in Lisp-like lanuages, at least when the number of > elements is large and performance is important.
I think you are correct if (a) the language supports variable length arrays or (b) the user knows how many elements he will have or (c) random access of the sequence data is important and (d) no large amountof sharing of sequence data is needed and (e) no insertion of elements into the sequence happens. In other words, it depends. Also, note that what is supported by the language is often what gets coded by a user. If every sequence looks like an array, the user will avoid insertion and sharing of substructure and start assuming that O(1) access of any element is present. This, of course, makes some code very ugly, but since the user doesn't see any obvious alternative, he's stuck.
> (3) As a consequence of (1) and (2), I don't see a lot of justification > for cdr-coding, at least as something one would do in hardware.
If you read my post carefully, you would have seen that nowhere did I ask whether or not there should be hardware support for such a feature. In fact, history has shown that language specific features for hardware are usually a bad idea. What I asked was given the mismatch of cache speed and CPU speed, would a software implementation of CDR-coding be a good implementation technique. I was hoping that someone might pop up with some discussion that actually involved hardware and cycle count. Instead I get straw men about hardware support for the feature.
To me the only valid argument I have seen so far is the one that says that there are better uses for the tag bits. This I can believe. Show me cycle counts that demon- strate that CDR-coding would add time to the "straight-line" path for list traversal and I will believe that CDR-coding is still a bad idea. Arguments that simply assert that arrays are "generally" better than lists are not convincing. Lists AND arrays have been around since the advent of computers. They are both useful under different circumstances and I don't think that you've given enough evidence to show that an array is the best choice for the default sequence type for a language. Certainly, it seems to be the "most natural" choice for Fortran, C, and Java language users, given that construction, traversal, and manipulation of lists is clunky in those languages (Strangely enough C++ with the STL and operator overloading sidesteps this objection to a certain extent, but I attribute that to luck more than any conscious design effort) but that's hardly a convincing argument overall.
Erik Naggum <e...@naggum.net> writes: > Excuse me? Yes, it makes sense to read Lisp data that the application > does not know the structure of a priori. For one thing, source _code_ > matches that description.
Your point seems to be that it is a bad idea to use array to represent complex nested structure, such as Lisp-style source code. I agree this arrays are at a disadvantage here, since Lisp source code uses many variable-sized and short sequences. However, the disadvantage is minor (at most a small constant factor), and it doesn't really matter, since source code is relatively small and only kept around during input and compilation. It is more interesting what happens with large files and large sequences. That is where arrays win and (I think) lists have the disadvantage.
> Yes, [the application] knows they will all be Lisp objects. > That's it. This is part of the significant charm with Lisp's lists.
An application has to know more. Presumably it intends to *do* something with those Lisp objects.
> There's something wrong with your math. The average number of times > an element is copied is 1.0 if you have (- (1+ (expt 2 n)) m) elements > in the final array with m elements in the initial allocation and > approaches 2.0 asymptotically for (1+ (expt 2 n)) elements in the > final array, since adding the last element will copy everything and > end up with half the allocated array empty (short that one element).
I believe your're right.
> | My argument is a little of all of: > | > | (1) I think high-level programming languages should ... > | (2) I think programmers should ... > | (3) As a consequence of (1) and (2), ...
> I'd be interested in how you _arrived_ at your preferences, instead of > just having something follow from some random personal favorites.
First, I've tried to argue that arrays are a better general-purpose data structure than lists: You can do things (indexing) that you can't to efficiently with lists. Sequential traversal (including merging and appending to the end) is perhaps most common use of sequences, and boths arrays and lists handle that well, but arrays make it easy to traverse from either end. Arrays have better locality of reference, and they take less space (in general - short arrays and extensible arrays just after resizing might lose out spacewise to lists, but not by much). Lists have some advantages. Some of the advantages are historic (i.e. Lisp is based on lists); other may be inherent. For example some algorithms (tree-sort? - merge-sort? I forget the name) need linked lists. Other applications may be more convenient to code using lists, especially in languages like Lisp which historically are oriented towards Lisp. So, generally arrays make a lots of sense, and if you are going to represent a sequences of values in your program, in general arrays are more likely to be the better answer than lists. Now if you are going to be using Lisp, this may be a little different, just because of Lisp's support for lists. But from an abstract data-structure point of view (and note this is comp.arch, not just comp.lang.lisp), arrays usually are to be preferred over lists.
Secondly, if I was designing a programming langauge from scratch, I would use arrays rather than lists as the fundamental data type for implementing sequences, partly for the reasons above. The other reason is that I like languages that work on collections as a unit, and provide operations than work on collections (sequences, sets, relations, trees). Such languages can be very powerful and compact. I also think they may be easier to use than languages that encourage recursion to work on lists. I like side-effect free languages, or at least languages that encourage a side-effect free style. In such a language, the benefit of using lists instead of arrays is that it is easy to use recursion to define your algorithms. But using recursion to scan through a list is, I think, a bad idea, because it is harder to understand than iteration. Recursion should be used for recursive data structures. But defining a sequence as a recursive data structure (a chain of conses) is wrong - you're creating recursion where you don't need it.
> This is where you're losing me. I don't see a huge need for making a > basic type more efficient in languages that _don't_ show a predominant > use of that basic type.
Neither do I. My point was the different trade-off of Java and Lisp.
>. And why are we discussing Java? I thought it > was already as array-friendly as you wanted languages to be?
Absolutely not. But that is a different discussion.
> And Lisp > already has adjustable arrays, so what's the point in redesigning them?
I'm not. You're the one who insists on adding a displacement pointer to all adjustable arrays.
> Sigh. Why am I wasting any time on this? The displacement is needed > for the "rest" (cdr) function that people who work on lists, however > they are actually implemented, need very often.
I guess the problem is we're discussing two different things here: Arrays vs lists in abstract, and how they are used in Lisp. When I work on arrays, I don't find the need for a rest function a lot. If you need the rest/cdr function a lot, then perhaps you shouldn't be using arrays. Or re-think your algorithms.
> Yes, but that is a truly boneheaded decision as it means that the data > array can no longer be specialized to contain a uniform data type --
Of course it can. This is what Java primitive arrays are: A non-adjustable array of some primtive type (for exampe double-floats). These are comonly implemented as an object header followed by a length followed by the data (words, bytes, or whatever).
> This is just stupid.
There are all sorts of tradeoffs, in memory management especially.
> | (It would be needed there anyway if memory management code needs to > | scan memory consequtively.)
> Really? I thought memory management code followed pointers and chains > from a root set if it were any smart.
Well, for example a compacting phase might be easier if you could scan memory in order.
> Randomly walking over memory it has no idea what contains
If you have the appropriate header words, you would know what the memory contains.
> | That gives only 2 words for the header meta-object.
> And much more overhead and increased complexity and general madness. > You have obviously never actually experimented with this in _Lisp_- > like languages.
Of course I have. However, I have done most of my work in the context of the Java VM, which has other tradeoffs than if you can custom-design your own memory-management. There are other Lisp implementations that use the Boehm conservative collector (or similar), perhaps because interoperability with C is desired, or because they would rather used an existing well-engineered GC instead of writing their own.
> Lists can grow at the tail end without any "header cons" simply > because we just overwrite the cdr that is nil in the final cons > with a pointer to the cons cell that holds the new element (and a > new nil).
Not if the list can start out empty.
> | As I said: Only twice total, on average.
> Well, you actually said *a single time*, but if you keep doubling it > every time you say it, I'm happy. (Sorry, but I found that fuuny. :)
Twice referred to the number of memory accesses to write the array (move once on average plus initialization). I'll bump that up to thrice, if you like.
> I'm confused about your purposes, I guess. Promote APL if you like, > but then I don't see why you're even offering suggestions for how to > implement lists efficiently in Lisp. This is mystifying to me.
I am not talking about how to implement lists efficiently in Lisp, if by "lists" you mean the "list" type of Common Lisp, except that I don't think there is much point in implementing cdr-coding, at least in hardware. What I am talking about how to implement sequences or lists in the more generic sense. -- --Per Bothner p...@bothner.com http://www.bothner.com/~per/
> Your point seems to be that it is a bad idea to use array to represent > complex nested structure, such as Lisp-style source code. I agree > this arrays are at a disadvantage here, since Lisp source code uses > many variable-sized and short sequences. However, the disadvantage is > minor (at most a small constant factor), and it doesn't really matter, > since source code is relatively small and only kept around during input > and compilation. It is more interesting what happens with large files > and large sequences. That is where arrays win and (I think) lists > have the disadvantage.
I think this paragraph sums up what is wrong with your argument for Lisp. You seem to be assuming that people typicially use huge, shallowly-nested lists which would `obviously' be better represented in some array form. Well, I just don't think that's true for well-written Lisp programs. I don't have general evidence, but in programs I've written I've done some measurement of this kind of thing, because I was concerned about things like search times, and I found mean lengths of about 6 elements and mean nesting depths about the same.
I suspect this isn't the case for Java, but Java programs perhaps don't deal with the same kind of data that Lisp ones do.
> Erik Naggum <e...@naggum.net> writes: > > You recommend arrays to other people, not just to yourself, unless I'm > > _gravely_ mistaken about the purpose of your article. Other people > > who have much less experience than you and rely on your experience in > > order to form their own, with the least possible pain. My experience > > with people who try to implement things with arrays is that they end > > up with very cons-like concepts once they outgrow the C "memory is > > really an array of bytes" idea. The reason is simple: Arrays with the > > properties we discuss are very, very hard to implement correctly.
> It is not that hard. Still, I agree it is is best if arrays are well > supported by the programming language or at least a good library.
What's wrong with STL?
If you have to use C++ anyway, STL has a lot of very useful ideas.
> My argument is a little of all of:
> (1) I think high-level programming languages should be built around > array support rather list support. (Actually, I prefer language > primitives to work on generalized sequences, with arrays just being > the default implementation.)
I.e. STL <vector> etc.
Terje
-- - <Terje.Mathi...@hda.hydro.com> Using self-discipline, see http://www.eiffel.com/discipline "almost all programming can be viewed as an exercise in caching"
* Per Bothner <p...@bothner.com> | I am not talking about how to implement lists efficiently in Lisp, if | by "lists" you mean the "list" type of Common Lisp, except that I | don't think there is much point in implementing cdr-coding, at least | in hardware. What I am talking about how to implement sequences or | lists in the more generic sense.
Well, OK. cdr-coding is all about efficient implementation of lists in Lisp, and the reason I insist on that displacement offset that you reject out of hand even with the explanation is that the context of the whole thread has been how to store lists more efficiently, while obviously keeping all the listness qualities. I'm sorry you have missed this.
I don't have a problem with your desire to use arrays. I use them all the time myself, and so do other Lispers. I don't see a conflict, and I don't see a need to optimize Lisp any more towards arrays, as the support for the abstract type "sequence" already there, too.
It seems to me that you think "Lisp" is the Scheme branch of things, which I barely think is a Lisp at all -- I think Scheme is an Algol with Lispy syntax and some functional properties. There are many reasons I loath Scheme, but one of them is certainly that it exposes the choice of underlying types so much in the interest of giving you that Algol feel of efficiency and typeness.
#:Erik -- The United States of America, soon a Bush league world power. Yeee-haw!
On Wed, 13 Dec 2000 06:57:46 GMT, Ketil Z Malde <ke...@ii.uib.no> said:
> You're probably right, but a good argument against it is that USENET > actually works in a responsive and reliable manner. WWW seems always > to be too slow and produce too many 40x's.
One of the reasons for that is that not all news servers get all of Usenet. Thank goodness.
Another argument against changing it to fetch-by-demand is that the way things are feeded now, it works like crude pre-caching.
(Fetch-by-demand can be implemented in a smart way, too, of course, lessening the problem, and even combined with pre-caching. Isn't that nice.)
-- In the beginning was the Bit, and the Bit was Zero. Then Someone said, Let there be One, and there was One. And Someone blessed them, and Someone said unto them, Be fruitful, and multiply, and replenish the Word and subdue it: and have dominion over every thing that is.
On 13 Dec 2000 10:48:19 +0100, Espen Vestre <espen@*do-not-spam-me*.vestre.net> said:
> Erik Naggum <e...@naggum.net> writes: >> I expect that only some headers will be passed around very shortly, >> with really clever cache propagation protocols that makes clients >> retrieve an article by message-id on demand, ultimately from the >> originating server, cutting traffic and disk requirements down to what >> is actually used, and killing half the stupid spam problem, too.
This is a bit too crude, unless it's your local server doing the fetching for you, not the client (user agent). But that may be what you mean with "clever cache propagation".
However, the originating/injecting server must have the bandwidth or other capacity for dealing with external requests for articles or even storing articles for as long as necessary (that is, until some reasonable amount of time has passed). If those requirements aren't met, the current model seems much better to me.
Considering that there are great variances in how long articles are stored from news provider to news provider, it seems likely that there is a significant amount of users who want to read older articles. It isn't unreasonable to assume there will be a good amount of arbitrary requests for older articles at the originating server, say up to a month after the article was posted. Someone with a large user/poster base will have to upgrade their injecting servers. :)
Another issue is that if the injecting server is somewhere remote in Australia and your client is in Norway, response will be slow, reducing the usefulness of Usenet compared to the web.
Ketil Z Malde has a point when he talks about the responsiveness of today's Usenet; it's very important for the user that the articles requested appear "immediately". (There hasn't been much research on Usenet, but I believe it's safe to apply relevant aspects of usability studies of the web.)
> I've been hoping for such a thing for about 5 years now, but is there > any work goin gon...?
From what I understand, it isn't uncommon to deal with header-only feeds, and Diablo supports fetching Message-IDs from other servers by demand (automatic redirecting to the other news server). The latter seemed to work well when I tested it when the news servers in the chain were topologically close. I didn't test with servers on the other side of the globe, though.
-- In the beginning was the Bit, and the Bit was Zero. Then Someone said, Let there be One, and there was One. And Someone blessed them, and Someone said unto them, Be fruitful, and multiply, and replenish the Word and subdue it: and have dominion over every thing that is.
In article <1b7l53pvqd....@viper.cs.nmsu.edu>, Joe Pfeiffer <pfeif...@cs.nmsu.edu> wrote:
> The post points out that the discussion may be fundamentally flawed: > lists and vectors are abstract data types, linked lists and arrays are > implementations. Up until that post, the data type and implementation > were being confused.
Good call, that man. Thanks for putting what I meant into clearer terms.
-- `-_-' In hoc signo hack, Peter da Silva. 'U` "Milloin halasit viimeksi suttasi?"
In article <m3elzehe1q....@localhost.localdomain>, m...@bewoner.dma.be says...
> Thanks for reminding me. Most of my stuff broke. But the mess was even > worse: they had to put in mode bits to allow older programs to run > unchanged. In a system like VM you ended up with two mode bits (EC/BC > and XA or something like that? I've forgotten most of this stuff and I > can't find my yellow booklet) and programs that only ran with one > combination of them. It was very nice to have eight bits to work with > with each pointer. Perhaps architecture designers should reconsider > giving user programs some bits in the address space.
This is not an issue for architecture designers anymore, because you can use paging to remap the addresses the way you wish. This will only work with mid to top bits of course.
You can take linux and do the remapping the way you wish if it's really that important.
Can't resist a discussion about old IBM 'frames and lisp. I did my lisp homework assignments on line-at-a-time TSO :-)
If I remember correctly, back around 1980 or so, if you asked an IBM engineer why they "needed" to move past 24-bit addressing to 31-bit, the reason given was unrelated to application programming. That is, noone needed to write programs which required that much memory, except for one very special group of programmers.....the programmers who wrote IBM's database management systems and operating systems.
Along with the big, new address spaces (actually just before them) came additions to the instruction-set which permitted and protected the ability to switch virtual storage context from one virtual address-space to another (note that this is orthogonal to the ability to switch from "kernel-mode" to "user-mode"). In practical terms, this permitted the OS itself to spread ITS OWN data-structures across multiple address spaces in order to circumvent the 16MB architectural limit. DBMS's did the same thing.
The design and performance "win" in both cases was to permit the OS's paging routines and the firmware's virtual-storage enhancements, which were highly-optimized, to manage the buffering of data and data-structures in real-memory. The "applications", ie. the bulk of the OS and DBMS, were thereby relieved of these duties: coding and design were simplified.
I should mention that due to previous architecture/design choices, not all of the older 16MB address space was available to the OS and/or DBMS (and/or application) for exclusive, private use. Thus the effective available virtual storage was less than 16MB, sometimes considerably less, dependent on local OS changes as well. Even more reason to enlarge it and to permit access to multiple virtual address spaces.
Any IBM folks want to chime-in here? Are all of the System/370 hands extinct already? I haven't touched a 'frame in 9 years (don't miss them either :-))
* Nick Geovanis The data were not anything that would lead anybody | IT Computing Svcs to conclude the Fed would do anything different | Northwestern Univ than what they already believed they were likely | n-geova...@nwu.edu to do. - Chief Stock Market Analyst, 12/14/2000 +-------------------> Cantor Fitzgerald and Co.
Nicholas Geovanis <nick...@merle.acns.nwu.edu> writes: > I should mention that due to previous architecture/design choices, not all > of the older 16MB address space was available to the OS and/or DBMS > (and/or application) for exclusive, private use. Thus the effective > available virtual storage was less than 16MB, sometimes considerably less, > dependent on local OS changes as well. Even more reason to enlarge it and > to permit access to multiple virtual address spaces.
> Any IBM folks want to chime-in here? Are all of the System/370 hands > extinct already? I haven't touched a 'frame in 9 years (don't miss > them either :-))
a major MVS issue (in 24bit addressing land) was that it had inherited from SVS, MVT, PCP (back to early '60s), etc a paradigm/design where the supervisor occupied the same address space as the application programs. This was slightly mitigated in the SVS->MVS (in the early '70s) where the restriction that all applications programs and all supervisor functions occupy the same 24-bit address space was slightly lifted (single virtual storage ... became multiple virtual storage ... where there was a 16mbyte address space for each application ... with the MVS supervisor and various subsystem components residing in each one).
However, by the late-70s having all supervisor functions in the same address space as the application along with various supervisor subsystem requirements were again starting to serverely strees the 24bit address limit.
while some of the MVS gurus might have believed that they were the biggest address space hogs in the world, some MVS installations were having difficulty leaving even 6mbytes (of the 16mbytes) available to application program. There were actual applications in the '70s that demonstrated large address space appetites. Some of these were large database transaction subsystems that had to exist in the tiny space left in the 16mbytes space after the MVS supervisor and subsystem requirements were met.
In the initial port of apl/360 to cms/apl ... the apl workspace limited was opened up from typically 32k-64k bytes to just under 16mbytes. There were a number of applications that actually took advanage of the increased workspace size.
One of those were the HONE service. This was the service in the '70s and '80s that supported world-wide sales, marketing, hdqtrs, and field support operations. One example, starting sometime in the mid '70s, IBM mainframe orders became so complex that it couldn't be done manually, a HONE application was needed to fill-in the order. Another big use of HONE was for economic planning & modeling ... much of the "what-if" processes done today on PC spreadsheets were performed in APL.
In '77, the US HONE operations were consolidated in a single location in california with, what was at the time believed to be the largest single-system image operation in the world (cluster of SMP processors sharing large disk farm). In '78/'79, the single-system image was replicated in dallas and boulder providing disaster survivability support (in case of national disaster, like an earthquake in cal.). This was in addition to various HONE clones that resided in numerous countries around the world.
Almost the entire HONE "experience" was delivered first on cms/apl and then later on apl/cms.
* Jan Ingvoldstad <j...@ifi.uio.no> | This is a bit too crude, unless it's your local server doing the | fetching for you, not the client (user agent). But that may be what | you mean with "clever cache propagation".
I consider today's USENET distribution a crude cache propagation mechanism and propose something that uses much less bandwidth and disk space while it maintains the principle of nearby caches. A few years ago, for instance, the five largest USENET sites in Oslo, Norway, were all located within a square mile, duplicating each other with great precision and speed because they had different user bases and owners. The extant news software could not propagate news articles by any other means than flooding every server with everything, but if the news protocols were only slightly smarter, they could have waited until they needed the article body before they requested it, if they had the headers that they could show to users. Users would not have noticed those delays, and if there were any millisecond delays, it would be only for the first reader that day/week/whatever. Now, scale this up and consider large (network-topological) regions cooperating to avoid duplicating news articles and traffic needlessly. How many USENET servers are there around the core interchange points these days? Distributed, redundant load balancing is not exactly news to people who care about distributed systems, but we still have people who worry needlessly if they cannot have their local USENET feed. If you get upset because you cannot read news because a remote server is down, you are a USENET addict and need treatment, not better servers.
The problem with the whole current design is, however, that you do not know who the originator (initial injector) and owner is and you cannot request the article except from another nearby cache, so if you have a mesasge-id that some article is a response to, you are basically hosed if you have not been so lucky as to have it flood by you. That does not mean there _is_ no originator and owner, so I consider it useful to think of the injecting user as _virtually_ reachable.
| However, the originating/injecting server must have the bandwidth or | other capacity for dealing with external requests for articles or even | storing articles for as long as necessary (that is, until some | reasonable amount of time has passed). If those requirements aren't | met, the current model seems much better to me.
The current in-flow rate is a problem. I have never heard of any USENET site that has out-flow problems for articles originating at their site. Keeping your "own" articles around for months should be the smallest issue compared to keeping everybody else's articles around for weeks and days. You can increase your out-flow a hundred- fold and still be _far_ short of the current in-flow, and this makes it possible for leaf sites to have (relatively speaking) _very_ small caches while their preferred nearest cache (akin to the site they get their feed from today) holds on to articles as long as one of their many leaf sites request it. The great thing with a design like this is that you can upgrade from the leaf sites "inward" to the core servers, because as long as there are major core servers who still operate in the "old way", there will be much less pressure on the injecting site. Given how slowly changes occur in the USENET core technology, the chances are very, very good that there will remain a number of huge servers who can and will act as proxies for the many injecting sites that will never see any significant network load.
| Considering that there are great variances in how long articles are | stored from news provider to news provider, it seems likely that there | is a significant amount of users who want to read older articles.
Yes, that's the idea, to keep the original article available longer.
| It isn't unreasonable to assume there will be a good amount of | arbitrary requests for older articles at the originating server, say | up to a month after the article was posted. Someone with a large | user/poster base will have to upgrade their injecting servers. :)
Methinks you have lost all sense of proportion and need to back up and look at the numbers you give for the current situation and its growth and consider who the people are who contribute the volume of data that you refer to. Yes, there are some large sites, perhaps responsible for as much as 1/1000th of the total volume on USENET each, but they _already_ have 1000 times the capacity to handle "injections" and if you ask them a 100 times for every article they have published, they are still at 1/10th their old bandwidth, and they aren't going to fill the remaining 9/10th with requests from other servers, either.
| Another issue is that if the injecting server is somewhere remote in | Australia and your client is in Norway, response will be slow, | reducing the usefulness of Usenet compared to the web.
Really? How come I can pick up the article from any number of very nearby caches today? Hmmm. Mystifying! How _did_ they get there?
| Ketil Z Malde has a point when he talks about the responsiveness of | today's Usenet; it's very important for the user that the articles | requested appear "immediately". (There hasn't been much research on | Usenet, but I believe it's safe to apply relevant aspects of usability | studies of the web.)
Yeah, I think you guys have got it _exactly_ right. Of course I'm out to destroy USENET and its usability when I suggest that we invent a better way to propagate articles. Of course I'm trying my very best to screw with the minds of people who implement the software so _they_ also think "Hey, this USENET thing really was a bad idea to begin with, so let's just do something utterly braindamaged that will kill it and hopefully everybody using it, too". Get a _grip_, guys!
If you don't understand cache propagation mechanisms, say so. If you cannot even _imagine_ that somebody out there have tried to think about the way USENET propagates _while_ trying to keep it working for its users, I suggest you try to insult people openly instead of just assuming they have extra spicy garlic meatballs for brains. Sheesh.
Today's USENET propagation model does not try to keep track of where articles are read, that is the task of local news admins, most of whom take some pride in providing "everything" to their users. If we did not ship the articles until they were read, only enough header stuff that people could see newsgroups listings and stuff, the traffic would change patterns to adapt to the readership instead of the global flood algorithm used today. This would cause an increased ability to read "fringe" newsgroups (it's always a hassle to get a news admin to get another weird newsgroup hierarchy, because only one group is too much work and the whole hierarchy is too much data), with actual _user_ input to the distribution of news articles.
| From what I understand, it isn't uncommon to deal with header-only | feeds, and Diablo supports fetching Message-IDs from other servers by | demand (automatic redirecting to the other news server). The latter | seemed to work well when I tested it when the news servers in the | chain were topologically close. I didn't test with servers on the | other side of the globe, though.
I'm aware of Diablo and strongly encourage further development along those lines, but as the network of servers picks up messages, they will naturally be cached many places along the way, not very much different from today's system. Having to follow a chain of forwarding servers to get a particular article is therefore very unlikely, unless you read "fringe" newsgroups that nobody else in your vicinity reads. When you do that, you might also well tolerate longer access times.
#:Erik -- The United States of America, soon a Bush league world power. Yeee-haw!
> What I asked was given the mismatch of cache speed and CPU speed, > would a software implementation of CDR-coding be a good > implementation technique.
This is a reasonable question. CDR-coding could conceivably be a useful implementation techique in certain environments optimized for large Lisp code bases. That assumes the effort to implement it is reasonable; since I'm not convinced you could not get a better payoff by using arrays in place of lists, at least in those place where cdr-coding would make difference. But that might require re-writing working code, which I can see people would rather not do!
> [List and arrays] are both useful under different circumstances
Agreed.
> and I don't think that you've given enough evidence to show that an > array is the best choice for the default sequence type for a language.
Well, reasonable people can differ on this one.
> Certainly, it seems to be the "most natural" choice for Fortran, C, > and Java language users, given that construction, traversal, and > manipulation of lists is clunky in those languages
Using Lisp-style lists is quite straight-forward in Java. What it lacks is a standard pre-defined list class based on conses. Using lists in languages that lack a garbage collector is likely to be more tedious. -- --Per Bothner p...@bothner.com http://www.bothner.com/~per/
Tim Bradshaw <t...@tfeb.org> writes: > You seem to be assuming that people typicially use huge, > shallowly-nested lists which would `obviously' be better represented > in some array form. Well, I just don't think that's true for > well-written Lisp programs. I don't have general evidence, but in > programs I've written I've done some measurement of this kind of > thing, because I was concerned about things like search times, and I > found mean lengths of about 6 elements and mean nesting depths about > the same.
I think this actually supports my point, since arrays of length 6 are more space-efficient than lists of length 6. (With some quibbles about implementation techniques for adjustable vs non-adjustable arrays.) I agree it wouldn't make that much difference either way for short lists.
One might also use data structures differently in an array language. The classic example is a matrix, which is more efficiently represented using a single vector in row-major order, rather than a nested vector of vectors. Another example is using two arrays to represent a tree.
> I suspect this isn't the case for Java, but Java programs perhaps > don't deal with the same kind of data that Lisp ones do.
Probably not - and good Java programmers would be likely to structure a solution differently than good Lisp programmers would. -- --Per Bothner p...@bothner.com http://www.bothner.com/~per/
... oh yes, and a somewhat separate issue for the 370 ... in addition to virtual 24bit address was the issue of real 24bit address. With sufficient concurrent applications it was possible to start to stress the 16mbyte real storage limits (and possibly precipitate page thrashing).
many 3033 shops in late '70s (basically a souped up 370/168) started seeing these real-storage constraint problems.
the issue was how to address the problem.
CCW (i.e. i/o transfer) already supported 31bit real address with IDALs.
To get some slight relief, the 3033 introduced a hack for getting more than 16mbyte real storage. The 370 pagetable entry is 16bits, 12bits for specifying real page number (when combined with 12bit, 4k pages, yields 24bit addressing), an invalid bit, a protection bit, and two unused bits ... something like
NNNNNNNNNNNNPxxI
where "xx" are the unused/undefined bits. The 3033 hack was to use the two "xx" bits and prefix them to the 12bit page number to allow addressing up to 2**14 real pages ... or 2**26, 64mbytes of real storage. Executable code was still limited to 24bit virtual addresses but it was possible to allocate virtual pages in real storage above the 24bit line by setting the appropriate bits in the page table entry. And of course, the standard 370 CCW IDALs already had 31bits available for addressing real storage in i/o operations.
cross-memory services was also introduced with the 3033. in an attempt to help get some of the supervisor subsystem functions out of the same address space as the application (at least get things to the point where maybe a whole virtual 8mbytes was available to applications) ... and not to have a significant downside impact on the MVS "address-pointer" parameter paradigm, these supervisor subsystem functions had to reside in their own address space while still directly supporting services requiring addressing of the application virtual address space. cross-memory services introduced new addressing modes allowing instructions to address virtual storage different than the virtual address space that they were executing in.
> Secondly, if I was designing a programming langauge from scratch, I > would use arrays rather than lists as the fundamental data type for > implementing sequences, partly for the reasons above. The other > reason is that I like languages that work on collections as a unit, > and provide operations than work on collections (sequences, sets, > relations, trees). Such languages can be very powerful and compact.
You might be interested in NIAL (Nested Interactive Array Language)
-- Paul W. DeMone The 801 experiment SPARCed an ARMs race of EPIC Kanata, Ontario proportions to put more PRECISION and POWER into dem...@mosaid.com architectures with MIPSed results but ALPHA's well pdem...@igs.net that ends well.