* Barry Margolin <bar...@bbnplanet.com> | I'm not familiar with the Pentium III architecture (I haven't really | studied an instruction set since the Z80). Could you give some examples | of the features you're talking about?
I'd like to, but that needs delving back into the documentation and doing quite a bit of research. I have not found time for it in the past few days, and a three-week break is coming up fast, so I have to disappoint you with long response times on this one.
| Are these things like the MMX instructions (those were in the Pentium II, | as well, weren't they)?
the simplest idea is to use published knowledge of the core architecture to schedule register updates, memory transactions, etc, vastly more efficiently. several new instructions have also been added to make life a lot easier for particular tasks -- but actually being able to use them well requires significant effort on both the part of the programmers and the compiler writers.
| I know it has lots of bit-twiddling instructions that are supposed to be | useful for graphics, and I think some DSP-related instructions, but | neither of these seem like they would be of much assistance in Lisp.
once I understood what they were doing, they were useful mathematical functions and transformations that required you to shape your problem in a particular way to be able to use. some of these "contortions" would not be something a Lisp compiler would ordinarily engage in at all. however, I wanted to be able to use these instructions and looked at how I would have to go about that, but in the end, I decided against adding instructions to the compiler.
| How do other applications make use of these features?
inlined assembly, or even whole functions in assembly language. there are few compilers out there that can make full use of these things, but the instruction scheduling isn't terribly hard to codify. the best part is that when it's done properly, it doesn't affect performance on the older processors. the new instructions are sometimes expanded as macros by the better assemblers into instruction sequences for older processors.
| Isn't there some way to use the instructions and trap into macrocode when | running on an older processor?
there are a whole bunch of CPU flags which tell you whether these new features are available or not, so you would typically prepare a fat binary that contained several versions of the same function according to processor to make optimal use of these features. many games do this.
Discussion subject changed to "Lisp hardware not just about performance. Re: Transmeta's Crusoe as a host for neo-Lispm, or not?" by Christopher R. Barry
Tim Bradshaw <t...@cley.com> writes: > * Christopher Browne wrote:
> > a) Deploy newer "blindingly fast" Lisp hardware, or > > b) Rewrite the code for newer hardware.
> I think this slightly misses my point -- though less badly than many > others have!
> What I am trying to say is that it's not really necessary to worry > about hardware at this point. Good Lisp systems on stock hardware are > not that much slower than the fastest language implementations that > currently run on that same hardware.
This whole thread (which I am now catching up on, having finally done most of my homework...) has so far only debated the performance aspects of Lisp vs. non-Lisp hardware. Why do we not have locatives in Allegro CL or Lispworks though? Or a debugger that works as well as the Symbolics one which (in my experience) _always_ and _reliably_ gives you all information with names about everything at any level on the stack and from breakpoints in your program lets you evaluate forms interactively in the environment of the breakpoint as well as modify this environment before continuing.
Theoretically I suppose the Franz and Harlequin people could make code compiled with (DEBUG 3) 5 times slower and larger and add all this Symbolics functionality to the debugger. But the important point I am making is that they of course haven't, and they possilby would if there was perhaps a reasonably efficient and not too painful way to do it.
The other important point is that Lisp hardware makes compiler writing and low-level Lisp hacking easier. It's more fun to look at the output of DISASSEMBLE on a Lisp machine than on an Intel processor, though ultimately the code may not run faster than it would on an Intel....
Christopher
[Who today handed in a programming assignment not due until Feb 16. because he got to use Lisp and bang the thing out in less than a days' time while all his classmates get to suffer in C++ hell. And I banged the thing out despite not having programmed for some time and wasting a lot of time suffering a minor bout of temporary stupidity about arrays and fill-pointers....]
"Christopher R. Barry" wrote: > Or a debugger that works as well as > the Symbolics one which (in my experience) _always_ and _reliably_ > gives you all information with names about everything at any level on > the stack and from breakpoints in your program lets you evaluate forms > interactively in the environment of the breakpoint as well as modify > this environment before continuing.
If you don't have a particular need to debug compiled code, you may try an interpreted version. Regarding the names and values of all variables in the (compiled) lexical environment, it depends on whether the variable is visible at the particular point. For example, variables bound by LET may be thrown away after their last use, before the physical end of the lexical environment.
> Theoretically I suppose the Franz and Harlequin people could make code > compiled with (DEBUG 3) 5 times slower and larger and add all this > Symbolics functionality to the debugger. But the important point I am > making is that they of course haven't, and they possilby would if > there was perhaps a reasonably efficient and not too painful way to do > it.
It must require a lot more resources and risk-tolerance to reimplement CL on a specialised chip than to bring current implementations to the Symbolics level of usability if there is such a gap.
> This whole thread (which I am now catching up on, having finally done > most of my homework...) has so far only debated the performance > aspects of Lisp vs. non-Lisp hardware. Why do we not have locatives in > Allegro CL or Lispworks though? Or a debugger that works as well as > the Symbolics one which (in my experience) _always_ and _reliably_ > gives you all information with names about everything at any level on > the stack and from breakpoints in your program lets you evaluate forms > interactively in the environment of the breakpoint as well as modify > this environment before continuing.
I think the locatives issue is reasonably good (but you can actually live without them for most purposes, and cheap (non-consing) locatives probably almost prevent an implementation on stock hardware and thus should (IMO) never be in CL-the-standard.
I think the answers to the other questions basically are lack of resource and lack of customer demand. Duane Rettig gave a presentation at last year's LUGM which described some support for some very cool-looking debugging stuff (I forget the details I'm afraid, Duane probably is reading this and has them ...) which looked to me like it could really do a lot interesting things. I'm reasonably confident that if you offered to fund one of the vendors to produce really cool debugging tools they would come up with the goods. Remember how much money was spent at Symbolics...
Another point to bear in mind is that an optimising compiler often compiles away all sorts of things that you think are there. The symbolics one tended not to but then it didn't do very much optimisation.
(Incidentally, the symbolics debugger has some pretty buggy areas, particularly the looking-at-source stuff just doesn't work in many cases).
Tim Bradshaw <t...@cley.com> writes: > Another point to bear in mind is that an optimising compiler often > compiles away all sorts of things that you think are there.
But (SPEED 0) (DEBUG 3) or appropriate should make that a non-issue.
> The symbolics one tended not to but then it didn't do very much > optimisation.
> (Incidentally, the symbolics debugger has some pretty buggy areas, > particularly the looking-at-source stuff just doesn't work in many > cases).
I found it always worked as long as you remembered to compile with "source locators" toggled on. Most of the system sources I guess were compiled with them off, since they are supposed to bloat everything and make it slower. In Zmacs you compile a form with source locators temporarily toggled by doing c-m-sh-C instead of just c-sh-C (IIRC).
Robert Monfera <monf...@fisec.com> writes: > Regarding the names and values of all variables in the (compiled) > lexical environment, it depends on whether the variable is visible > at the particular point.
With (DEBUG 3) and low speed, visibility should correspond to what it conceptually is to a human.
> For example, variables bound by LET may be thrown away after their > last use, before the physical end of the lexical environment.
As they should be by the compiler; but when (DEBUG 3) and low speed is set, and I set a breakpoint within the live range of the lexical variables, even if it occurs after their last use and the compiler is free to clobber them, I want to see their names and all their info.
"Christopher R. Barry" wrote: > With (DEBUG 3) and low speed, visibility should correspond to what it > conceptually is to a human.
I agree that this should be the very purpose of (DEBUG 3). What I am asking is practical, not conceptual: is there something (variable names, ability to change them) that you don't get when you debug an _interpreted_ function?
Robert Monfera <monf...@fisec.com> writes: > "Christopher R. Barry" wrote:
> > With (DEBUG 3) and low speed, visibility should correspond to what it > > conceptually is to a human.
> I agree that this should be the very purpose of (DEBUG 3). What I am > asking is practical, not conceptual: is there something (variable names, > ability to change them) that you don't get when you debug an > _interpreted_ function?
So far that seems to work okay. Unfortunately, if you are working with a large program with many components that aren't near fully debugged, running all of them interpreted is just _too_ slow (like hundreds of times instead of maybe 1.5-5 times or whatever for (DEBUG 3)).
> So far that seems to work okay. Unfortunately, if you are working with > a large program with many components that aren't near fully debugged, > running all of them interpreted is just _too_ slow (like hundreds of > times instead of maybe 1.5-5 times or whatever for (DEBUG 3)).
Why would you do that? Run it all compiled, wait till it breaks and then rerun the bit that broke with that functuion interpreted.
Tim Bradshaw <t...@cley.com> writes: > * Christopher R Barry wrote:
> > So far that seems to work okay. Unfortunately, if you are working with > > a large program with many components that aren't near fully debugged, > > running all of them interpreted is just _too_ slow (like hundreds of > > times instead of maybe 1.5-5 times or whatever for (DEBUG 3)).
> Why would you do that? Run it all compiled, wait till it breaks and > then rerun the bit that broke with that functuion interpreted.
Because it might be a "deep" bug. One of those ones that only shows up once every two weeks or months or something when in rare circumstances the 5+ different conditions for the bug are met and when you get into the debugger you are going to have to work with whatever you've got on the stack.
Genera is pretty buggy I think, but it's really nice how you go to the debugger with loads of information instead of having things just crash/core-dump/whatever.
Note that I am _very_ _very_ pleased with the Allegro CL debugger at least. I've spent much time with it and it is an excellent tool that while not perfect (neither is the Symbolics, nor could any debugger be "perfect") one can be very productive with.
> > Why would you do that? Run it all compiled, wait till it breaks and > > then rerun the bit that broke with that functuion interpreted.
> Because it might be a "deep" bug. One of those ones that only shows up > once every two weeks or months or something when in rare circumstances > the 5+ different conditions for the bug are met and when you get into > the debugger you are going to have to work with whatever you've got on > the stack.
This scenario assumes that you would be running your application with (debug 3) possibly for months. If you consider the overhead of maintaining symbol information and avoiding the very aggressive optimizations compilers do, chances are the factor of slowdown would probably be closer to ~100 than to 5. This is my guess only, and I don't know how much faster a "genuine" (debug 3) would be compared to interpreted code.
Also, I found that these tyoes of elusive errors come up when I assure the compiler that something would be of a certain type, and I don't keep the promise - these could be caught with a higher safety and lower speed level.
> Because it might be a "deep" bug. One of those ones that only shows up > once every two weeks or months or something when in rare circumstances > the 5+ different conditions for the bug are met and when you get into > the debugger you are going to have to work with whatever you've got on > the stack.
But I think running with a (DEBUG 3) that does what you want is going to cause you really serious slowdown in any case. Optimizing compilers optimize for a reason...
I've found that I can run with fairly high optimization settings and still have enough "debug-ability" for most purposes. I laugh at nearby C++ programmers who seem to need two versions, one with -g and one without. I sometimes have to resort to running something interpreted, but not very often.
Someone mentioned complex or deep bugs where you're glad you have lots of debugging info there all the time, rather than have to run again and hope the same problem occurs. But I find that in difficult debugging cases I usually have to rerun anyway, because some the information I need is no longer on the stack in any case.
Another useful technique is to build some debugging tools of your own, something that is fairly easy in Lisp. For instance, when I had some process-like things that sent "messages" to each other, I wrote some things to let me monitor messages of specified types or, more generally, let me specify an arbitrary function that said which messages were "interesting".
I took a look at this chip as a lisp processor, out of curiousity. Its been my assesment for some time that CPU cycles are mostly "free", such that a 10-times instruction-level inefficiency isn't much of a performance hit today, and running natively compiled run-time typed lisp-like language, you still run circles around these other languages. Only C/C++/Fortran is faster at raw CPU efficiency, and who cares.
The performance bottleneck of today's (web) apps is database access/update. And it's here where there is opportunity to show massive performance advance and scalability over traditional languages and systems by using a native dynamic persistent object system. This doesn't need any special hardware. It needs something else.
Kelly Edward Murray wrote: > The performance bottleneck of today's (web) apps > is database access/update.
Given the ~$2k/GB price of memory today, this is questionable. If database performance is the bottleneck, it's best to store the entire database in the memory and use a memory-optimized representation of data. For example, I am working on a representation that stores class instances without the overhead of type information, making it even more compact and cache-aware. There is no way I can compare its performance with a disk-based database, even if there's enough memory for the disk-based one to fit in the memory.
The number one cost in a web server project is manpower, and usually a lot of time is spent on database optimizing and schema denormalization. Memory cost is lower and more predictable. In my experience, a 10GB disk-based dataset fits nicely in 1GB of memory, because of the much lower need for denormalization and indices, and better taylorability of atomic data representation.
> I took a look at this chip as a lisp processor, > out of curiousity. Its been my assesment > for some time that CPU cycles are mostly "free", > such that a 10-times instruction-level inefficiency isn't much of a > performance hit today, > and running natively compiled > run-time typed lisp-like language, you still run circles around these > other languages. > Only C/C++/Fortran is faster at raw CPU efficiency, > and who cares.
I've heard exactly that claim from people who do serious (commercial) computationally-intensive stuff -- basically cycles are now free, it's cache misses that cost you, for `cache' being one of registers, cache, and memory of various flavours. I guess for web / network apps you'd also want to add `disk' to this -- going over the network is pretty bad...
Robert Monfera <monf...@fisec.com> writes: > Given the ~$2k/GB price of memory today, this is questionable. If > database performance is the bottleneck, it's best to store the entire > database in the memory and use a memory-optimized representation of > data.
For some databases, you can't get all of it into memory (unless you can afford hundreds terabytes of RAM and have hardware, OS and a runtime system that will support it).
And for those where you can, there are at least two problems:
- slow response before you get enough cache hits - garbage collection in *huge* lisp programs is a tricky subject, at least if you demand 24x7 operability.
Espen Vestre wrote: > For some databases, you can't get all of it into memory (unless > you can afford hundreds terabytes of RAM and have hardware, OS > and a runtime system that will support it).
Yes, this is true, but the context was web servers and OODB as the alternative. I am yet to hear of an OODB that handles that amount of data, but I think that most intranet or internet web server images would fit in a few GBs - and for even larger projects the memory costs may be insignificant compared to the combined effort of the project.
> And for those where you can, there are at least two problems:
> - slow response before you get enough cache hits
If everything is in the physical memory, how could it not beat disk-based access? Maybe you think of the necessary initial upload, which is done when you launch the image, rather than at the first access, or that the image size would exceed the size of the physical memory.
> - garbage collection in *huge* lisp programs is a tricky subject, > at least if you demand 24x7 operability.
When there is a massive amount of data, we already avoid GC - for example, elements of fully declared or fixnum arrays are not GC'd individually, and the array itself will become old soon - even if you don't use ACL's :allocation :old option. I also find that arrays created for temporary purposes are fast to create and GC - maybe it does not even have to be copied before it's freed, given enough workspace and general avoidance of large-scale consing.
I think that Lisp's GC is an asset when one demands high uptime, maybe Erik or somebody else who has done it has some caveats.
Robert Monfera <monf...@fisec.com> writes: > Yes, this is true, but the context was web servers and OODB as the > alternative.
Ok, my current context is a server that caches parts of a ~500GB relational database.
> If everything is in the physical memory, how could it not beat > disk-based access? Maybe you think of the necessary initial upload, > which is done when you launch the image, rather than at the first > access, or that the image size would exceed the size of the physical > memory.
No, as I already said, I was thinking in terms of an application that uses a relational database as a backend.
> When there is a massive amount of data, we already avoid GC - for
I'm not quite sure what you mean. Have you turned off global GC? Maybe I'm missing something which could be essential to my application ;-) -- (espen)
Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes: > > disk-based access? Maybe you think of the necessary initial upload, > > which is done when you launch the image, rather than at the first > > access, or that the image size would exceed the size of the physical > > memory.
> No, as I already said, I was thinking in terms of an application > that uses a relational database as a backend.
let me elaborate on that: I completely agree that memory-only solutions are very interesting for rather static databases, but I'm working with databases that are highly dynamic (*lots* of inserts and updates). If you try to do without any RDB or OODB as a persistent store backend for such a database, wouldn't that mean that you'd have to reinvent a whole lot of old database wheels? After all, e.g. Oracle is pretty clever at caching (we run Oracle servers which use caches of more than 500MB). -- (espen)
>> > disk-based access? Maybe you think of the necessary initial upload, >> > which is done when you launch the image, rather than at the first >> > access, or that the image size would exceed the size of the physical >> > memory.
>> No, as I already said, I was thinking in terms of an application >> that uses a relational database as a backend.
>let me elaborate on that: I completely agree that memory-only solutions >are very interesting for rather static databases, but I'm working >with databases that are highly dynamic (*lots* of inserts and updates). >If you try to do without any RDB or OODB as a persistent store backend >for such a database, wouldn't that mean that you'd have to reinvent >a whole lot of old database wheels? After all, e.g. Oracle is pretty >clever at caching (we run Oracle servers which use caches of more >than 500MB).
A possibly-similar approach is taken by FastDB <http://www.ispras.ru/~knizhnik/fastdb.html>, which combines in-memory with a transactional scheme that pushes updates to a transaction log on disk immediately just as is the case with traditional DBMSes.
If you head back to System R, the original RDBMS, and successors moving through to the big name RDBMSes like Oracle/Informix/DB2, the common thread is that they all do something analagous to demand paging. This happens for much the same reason that UNIXes and POSIXes almost all (QNX as a visible exception) do demand paging, namely that at the time they were developed, you couldn't possibly have enough memory to hold the whole database in memory.
The natural evolution that comes from that beginning is that the approach to DBMS implementation is to start by creating a demand-paging system under the assumption that you *don't* have enough RAM to hold the database in memory.
It then makes sense to do heavy-duty caching to minimize the negative impact of this.
Cach\'e, TimesTen, and FastDB make the contrary assumption that the whole DB *can* be stored in RAM. You still make sure that updates get pushed out to disk immediately to keep things robust.
I'll suggest the thought that you may have things backwards; in a relatively static database, there's likely to be some locality of reference that may mean that paging turns out to be cheap. If it's really dynamic, being able to find all your data in RAM is going to improve performance over the having to do a lot of paging to get at the data.
As far as keeping things robust goes, there ought to be *no* difference in the cost of writing transaction logs out to disk in either situation, as a robust transaction log will require similar operations either way. -- "I worry that the person who thought up Muzak may be thinking up something else." -- Lily Tomlin cbbro...@ntlug.org - - <http://www.hex.net/~cbbrowne/nonrdbms.html>
Espen Vestre wrote: > Ok, my current context is a server that caches parts of a ~500GB > relational database.
Can you estimate what would be the database size if you could do away with most indices, lock tables, redundancy and denormalization? If you would not have to use long time stamps and character-based keys but instead integers? If you stored the parts that are actually used? If you could split tables so that you don't have a lot of NULLs? Of course I'm not implying that the result would fit in the memory or that a RAM-based solution would be the right one for you, but in my experience few multiple-gig databases store data that worth nearly that much.
> > When there is a massive amount of data, we already avoid GC - for
> I'm not quite sure what you mean. Have you turned off global GC? > Maybe I'm missing something which could be essential to my application ;-)
Nothing special: simply allocating large, long-living vectors whose values are immediate - no GC is done on those big chunks (up to 64MB/chunk). Oldspace will only be GC'd if newspace is full, which should not happen often or at all. What are your experiences with this?
As for tuning, I allocate medium-short-lived (1-60 seconds) vectors (temporary indices etc.) too, which are good candidates to be tuned - not much need to do so so far.
You are right that in the case of RAM-based databases there is probably still a need to reinvent (learn and implement) some common DBMS techniques, like not actually deleting records (rows) from arrays, just flagging them, and compacting arrays eventually.
Does Oracle do caching more intelligent than fetching in records that have identical key-parts as specified for each table and dumping oldest-accessed records?
Robert Monfera <monf...@fisec.com> writes: > 64MB/chunk). Oldspace will only be GC'd if newspace is full, which > should not happen often or at all. What are your experiences with this?
Since I've just started to implement massive caching in my servers, I'm still a newbie wrt. GC, but I simply have to turn automatic global GC completely off, since the downtime that a global GC in a several- hundred-megabytes lisp image would mean is not acceptable. Since it's rather new, I'm still not sure what turning it off really implies, but I could always let my internal 'cron' thread do it in some predefined weekly service interval...
In article <ey3901k7fcm....@cley.com>, Tim Bradshaw <t...@cley.com> writes: >* Joe Marshall wrote:
>> Current processors are `C machines'. A custom processor designed for >> Lisp using current technology would significantly outperform a Lisp >> implementation on stock hardware.
>I believe there are papers around written by Lisp/OO people which >argue fairly strongly against this point of view. I'd be interested >in any compelling arguments otherwise (especially from lisp compiler >implementors).
Well, I was one of the principal implementors of Common Lisp for Data General before Common Lisp was "official" (late 70's or early '80s) on a "stock" monstrosity with 48 different instruction "formats", and so many instructions they were hard to count. (A kludge, on a kludge, on a kludge, of a clean simple machine). Although there were still a significant number of optimizations yet to added to the compiler, I studied several functions in detail at the assembly language level, comparing lisp's output, with fortran's output (DG had a very good fortran compiler) and with the best I could think of writing directly in assembly language. The lisp output was distinctly better than the fortran output, and only marginally worse than the best I could think create by hand. The "missing" optimizations centered around two things. 1. better utilization of typing information which the user didn't supply, but that was implictly available from the functions invoked, or from predicates used in conditions, or arms in typecase, and similar places. 2. doing a complete transformation to the applicative domain, optimizing there, and then transforming back to the normal imperative domain. As, in the applicative domain, variables don't "vary", but are only "initialized" and used, if at all, in a read-only way, it is very easy to do dataflow optimizations, and to remove dead computations, in a safe way. Once those things are done, the inverse transform back to the imperative domain gets rid of all the tail recursion, and turns it back into loops and variables.
If we had descended into writeable control store, we could, perhaps, have done a few things a little bit better, but nowhere near twice as well - more like ten percent or less. For SOME programs, such as those with large bignums, we could have benefited more by having microcode for the boole and logxxx family, as well as the arithmetic and comparison functions, and the mask and test etc functions.
A vliw machine typically is NOT a good machine for anything. The problem is that a vliw machine is just the same as a "normal-sized" instruction machine, with an added restriction that the instructions must be gulped in certain patterns (four at a time, or 2 of kind one, and 1 of kind 2, or many other variants). The HOPE is that the compiler will be able to use the other parts of the instruction word. The fact is you have just restricted the compiler's freedom, as well as any assembly language programmers. Unless the "benefit" of wide gulps and simlified memory caching logic can be made to exceed the loss from the restricted freedom, you have a net loss. But there is little benefit, as instuctions can still be "gulped" with a wide bus, and "shifted" into place inside the chip with only 1 gate level. No way will that ever approach the loss caused by forced idling of alu units because of vliw format restrictions.
Note: I said typically. The IA-64 may have found a way around the typical problem - but lacking benchmark numbers, I doubt it.
In article <388E2FE0.F877F...@melbpc.org.au>, Tim Josling <t...@melbpc.org.au> writes:
>From: Tim Josling <t...@melbpc.org.au> >Date: Wed, 26 Jan 2000 10:21:04 +1100
>In my experience with hardware and software vendors, if benchmark >information is not available, then the numbers can be assumed to >be bad.
insert "USUALLY" before assumed. The other reason benchmarks aren't reported is politics, when there is a "standards group" controlling things, and all the competitors vote on what the rules are. So then, somebody finds a loophole in the testing rules, publishes an "unfair" number, the rules are changed, and much later somebody invents a new technique - that is "technically" in violation of the wording of the rule designed to stop a certain kind of cheating. This is so ESPECIALLY if the rule broken makes the entire product a great deal better for the customers, as that is bad news for the majority of the standards group. Bingo - the world can't hear about the great results. Politics.
But, this is unlikely to be the case with the IA-64.