Transmeta's Crusoe as a host for neo-Lispm, or not?

Russell Senior

unread,

Jan 19, 2000, 3:00:00 AM1/19/00

to

I'd be interested to hear knowledgable people speculate on the
significance of the new Crusoe chip from Transmeta with its
code-morphing technology as a host for a new lisp-machine.

--
Russell Senior ``The two chiefs turned to each other.
sen...@aracnet.com Bellison uncorked a flood of horrible
profanity, which, translated meant, `This is
extremely unusual.' ''

Robert Monfera

unread,

Jan 19, 2000, 3:00:00 AM1/19/00

to

Russell Senior wrote:

> I'd be interested to hear knowledgable people speculate on the
> significance of the new Crusoe chip from Transmeta with its
> code-morphing technology as a host for a new lisp-machine.

Maybe hardware type/class tags, GC assistance, ...? Have you looked at
the patents they filed? That probably reveals some of the
possibilities.

It may not have to be a Lisp-machine to benefit from flexibility, maybe
a Linux application can just add extensions to the currently configured
behavior. Maybe Lisp would have much faster implementations on it than
other languages?

What about Merced? It has instruction scheduling quite different from
current processors, e.g., numerous and long pipes, executing multiple
branches in parallel and dropping all but one when the selected branch
becomes known. Maybe it benefits Lisp (e.g., a numerical operation can
assume and perform a variety of types for the operand(s) in parallel).
Maybe it is a lot of effort to create a port for Merced (at least they
say writing a new C compiler that supports the new concepts is quite an
effort, and whether performance gains are realized over x86 primarily
depends on the quality of compilation).

I'm wondering what people with _actual_ experience in compiler
development think :-)

Robert

Robert Monfera

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Still not being knowledgeable, here's an URL:

http://www.transmeta.com/crusoe/download/pdf/crusoetechwp.pdf

As the core is WLIW (64-128 bits), it could manipulate entire cons cells
or long floats.

Because of the Code Morphing, it may be possible to design a
Lisp-optimized microcode or to take the legacy Lispm instructions.

Robert

Russell Senior wrote:
>
> I'd be interested to hear knowledgable people speculate on the
> significance of the new Crusoe chip from Transmeta with its
> code-morphing technology as a host for a new lisp-machine.
>

Tim Bradshaw

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

* Russell Senior wrote:
> I'd be interested to hear knowledgable people speculate on the
> significance of the new Crusoe chip from Transmeta with its
> code-morphing technology as a host for a new lisp-machine.

Whenever these things come up, I feel compelled to has to ask: why?
In particular: what is so wrong with the processors we currently have
for Lisp? There have been a number of recent threads here about
performance of lisps, and the end result seems to be that Lisp systems
are well under a factor of 2 slower than C. And this is with a
relatively miniscule effort put into the lisp systems -- when was the
last time anyone really looked at cmucl's SPARC backend for instance?

So what is the big win supposed to be?

--tim

Espen Vestre

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Russell Senior <sen...@aracnet.com> writes:

> I'd be interested to hear knowledgable people speculate on the
> significance of the new Crusoe chip from Transmeta with its
> code-morphing technology as a host for a new lisp-machine.

hmm,
at some point there were some rumors connecting Transmeta to
lisp... haven't heard more about that, though.
--
(espen)

Michael Livshin

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes:

well, one of their 'code morphing' people is called Bill Rozas (one of
the primary authors of MIT Scheme). don't know if that's a
"connection", though.

it also seems to be (from the recent discussions here) that the
current CPU architectures are quite adequate for Lisp, it's the OS's
virtual memory subsystem that usually sucks.

> --
> (espen)

--mike

--
only legal replies to this address are accepted.

May all your PUSHes be POPped.

Tim Olson

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In article <3886A262...@fisec.com>, Robert Monfera
<mon...@fisec.com> wrote:

| Still not being knowledgeable, here's an URL:
|
| http://www.transmeta.com/crusoe/download/pdf/crusoetechwp.pdf
|
| As the core is WLIW (64-128 bits), it could manipulate entire cons
| cells or long floats.

The VLIW instruction is either 64 or 128 bits wide, but as far as I
can tell, the processor only operates on 32-bit integer data and
80-bit floating-point data.

-- Tim Olson

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.2 for non-commercial use <http://www.pgp.com>

iQA/AwUBOIcgB9mKVoJv3iXaEQJzawCgsDgxtv3c2t0LUNQqW/39VwfLtnwAn0fA
yFs3LFcAUe8w0Ez3iFsSwJiX
=68MD
-----END PGP SIGNATURE-----

Joe Marshall

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

* Russell Senior wrote:
> I'd be interested to hear knowledgeable people speculate on the

> significance of the new Crusoe chip from Transmeta with its
> code-morphing technology as a host for a new lisp-machine.

In order to compete in the PC processor market, you have to be compatible
with Intel. Innovating is very hard because there is no compelling reason
for the market to choose a non-compatible system for price or performance
when they can simply wait for Intel to catch up. Intel has smart people
and a lot of money. Competing with them on their playing field is hard.

Transmeta's code-morphing technology is targeted at separating
the processor architecture from the processor instruction set. They hope
that
by being completely compatible with Intel, having a similar
price/performance
break, *and* running at lower power will be a compelling reason to choose
their product. In addition, they hope that as Intel improves their chips
that
the freer design constraints for the underlying hardware will enable them to
stay ahead.

I doubt very much that Transmeta has any agenda beyond this. I don't
see any reason to believe that the code-morphing would give any benefit
to lisp above any other language. In fact, I would believe that the
code-morphing
software is tuned to recognize the kind of code that Microsoft Visual C++
typically produces.

Tim Bradshaw <t...@cley.com> wrote in message
news:ey3ogah...@cley.com...

>
> Whenever these things come up, I feel compelled to has to ask: why?
> In particular: what is so wrong with the processors we currently have
> for Lisp? There have been a number of recent threads here about
> performance of lisps, and the end result seems to be that Lisp systems
> are well under a factor of 2 slower than C. And this is with a
> relatively miniscule effort put into the lisp systems -- when was the
> last time anyone really looked at cmucl's SPARC backend for instance?

Current processors are `C machines'. A custom processor designed for Lisp
using current technology would significantly outperform a Lisp
implementation
on stock hardware.

However.... you are still competing with Intel. If you don't regularly
increase
the performance of your Lisp machine, the relative advantage in performance
will
rapidly disappear. For instance, you could emulate a lisp machine on a
Pentium and it will outperform an actual lisp machine.

~jrm

Erik Naggum

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

* "Joe Marshall" <jmar...@alum.mit.edu>

| Current processors are `C machines'.

this belongs in comp.arch, I guess, but isn't this why Merced is a flop
so far -- that is, since it _isn't_ a traditional "C processor"?

#:Erik

Tim Bradshaw

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

* Joe Marshall wrote:

> Current processors are `C machines'. A custom processor designed for
> Lisp using current technology would significantly outperform a Lisp
> implementation on stock hardware.

I believe there are papers around written by Lisp/OO people which
argue fairly strongly against this point of view. I'd be interested
in any compelling arguments otherwise (especially from lisp compiler
implementors).

--tim

Rainer Joswig

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

In article <iMHh4.381$Rj5....@dfw-read.news.verio.net>, "Joe Marshall"
<jmar...@alum.mit.edu> wrote:

> Current processors are `C machines'. A custom processor designed for Lisp
> using current technology would significantly outperform a Lisp
> implementation
> on stock hardware.

I personally would expect this to be the case, too.
Developing microprocessors is not that expensive and exotic
as it used to be - we see lots of custom chips and
custom processors.

It might alos be possible to redo an old processor design
(the Ivory from Symbolics, ...) and redo it in a modern
process - would be *much* faster than the old machines.

You could invest in a new paradigm and you might be able
to outperform everything that's available - this might
be much more expensive and would need a few very bright people.

Rainer Joswig, ISION Internet AG, Harburger Schlossstraße 1,
21079 Hamburg, Germany, Tel: +49 40 77175 226
Email: rainer...@ision.de , WWW: http://www.ision.de/

Joe Marshall

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Tim Bradshaw <t...@cley.com> wrote in message

news:ey3901k...@cley.com...

> * Joe Marshall wrote:
>
> > Current processors are `C machines'. A custom processor designed for
> > Lisp using current technology would significantly outperform a Lisp
> > implementation on stock hardware.
>

> I believe there are papers around written by Lisp/OO people which
> argue fairly strongly against this point of view.

They are wrong. Let me give a couple of concrete examples:

A `read barrier' is necessary to implement `incremental scavenging', all the
Lisp Machines had hardware to check for pointers to oldspace when reading
from memory. Most implementations of Lisp on stock hardware simply punt
assuming that `generational scavenging' will be fast enough. (There may
be counter examples, but the ones I have first-hand knowledge about,
do not implement a read barrier.)

The type codes on stock hardware have to be contorted to `fit in'.
Typically,
the bottom 2 or three bits of a word are used to indicate whether the object
is an immediate or a pointer. However, there aren't enough bits to describe
the type of every pointer, so there are auxiliary type bits on the object
itself.
This means chasing down a pointer whenever you need to do some operation
dependent upon the objects type. (Cons cells are so important that one of
the `primary' tags is used to represent them.)

The compiler must be very careful about maintining pointer objects in a
boxed
state on stock hardware. If a GC happens, it must be able to parse the
stack
and the registers to find the pointers. Lisp systems on stock hardware
often
partition registers into `boxed' and `unboxed' sets to aid the GC, or
arrange
for all objects to be boxed at the end of a basic block and poll for
interrupts
at that time. A custom Lisp system doesn't need to unbox anything because
the logic doesn't do stupid things by misinterpreting the tags as data.

----

The LMI K-machine would have competed favorably with a Pentium I. However
the K-machine was built in 1986 and was made almost completely in stock
TTL.

The big problem with custom hardware is keeping up with Intel. I have a
Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by
the end of the year. A 1 GHz Pentium can execute a *lot* of instructions.
It can afford to waste time boxing and unboxing bits and shuffling them here
and there. It would probably compete with a 100MHz custom lisp processor.
In two years, a 1GHz Pentium is going to look pathetically slow.

The second big problem is you can't play Doom on a Lisp machine.

Robert Monfera

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Joe Marshall wrote:
>
> * Russell Senior wrote:
> > I'd be interested to hear knowledgeable people speculate on the
> > significance of the new Crusoe chip from Transmeta with its
> > code-morphing technology as a host for a new lisp-machine.
>

> I don't
> see any reason to believe that the code-morphing would give any benefit
> to lisp above any other language. In fact, I would believe that the
> code-morphing
> software is tuned to recognize the kind of code that Microsoft Visual C++
> typically produces.

Chances are Russ was thinking of the option of extending or changing the
code-morphing software itself, rather than just simply running Lisp on
Crusoe as a plain old x86 binary. It could mean that some byte
sequences that are currently not used as x86 instructions can be
utilised to support additional registers, tagging or whatever Lisp
likes. It may also mean a straight reimplementation of the Lisp Machine
machine instructions.

On the other hand, Transmeta is likely to withhold or prohibit
technologies that would change or replace their code morphing software.

Robert

Robert Monfera

unread,

Jan 20, 2000, 3:00:00 AM1/20/00

to

Joe Marshall wrote:

> (Cons cells are so important that one of
> the `primary' tags is used to represent them.)

While not debating your point, I'd like to mention that it is not
necessarily true.

If you have an array of integers and you know all of them will be
integers, one bit is enough to let the operator know whether the rest of
the bits is a 31-bit fixnum or a pointer to a bignum. If it is a
pointer, you even have at least one spare bit (as you address words, not
bytes), which you could use to differentiate between 'mednums' (e.g., 64
or 80 bit hardware or software integers or even floats with 50-something
useful bits) and very bignums (arbitrary length beyond 64 or 80 bits).

(This can be done by the user of the language, who will thus become more
of an implementor.)

> A 1 GHz Pentium can execute a *lot* of instructions.
> It can afford to waste time boxing and unboxing bits and shuffling them here
> and there.

You are right: based on benchmarks, if the bits are already in a
register, a couple of wasted shifting or masking instructions do not
make a huge difference, as speed will be largely determined by how fast
you can get data from memory to cache.

This is why a fine resolution of compiler-optimizable types may be
preferable over a custom chip - instead of pulling in 16-24 bytes for a
bignum, maybe 8 bytes will do, and they could even be immediate values
in a properly declared array.

Robert

Andy Freeman

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

In article <hmLh4.900$Rj5....@dfw-read.news.verio.net>,

"Joe Marshall" <jmar...@alum.mit.edu> wrote:
> A `read barrier' is necessary to implement `incremental scavenging',
all the
> Lisp Machines had hardware to check for pointers to oldspace when
reading
> from memory.

I can't think of a 32 bit processor designed since the 68010 which
doesn't have hardware which can be used to implement a read barrier.
Even the processor in Sega's previous generation game box (a
processor which is also used in cell phones) has the relevant
hardware. The problem has been OS support, not HW support.

Sparc has tag support designed with input from lisp folk - I don't
know whether it actually helped. (Hmm - I'll bet that you can
use an MMU to do much of the tag stuff, and I'm sure that I'm not
not the first to notice that.)

> The LMI K-machine would have competed favorably with a Pentium I.
However
> the K-machine was built in 1986 and was made almost completely in
stock
> TTL.

At the LMI's price point, there were lots of things in 86 which would
have competed with a Pentium I, which wasn't all that interesting.
(The PI is the first dual issue x86, but it's completely static,
like a 486. The world changed with PPro, which started today's
"everything is dynamic" trend in x86 land.) For example, the
360/91 microarchitecture (from the early 70s) had fewer gates
than any of the Lispms and given the lispms implementation
technology, would have really screamed. The processor world was
very strange in the 80s.

-andy

Sent via Deja.com http://www.deja.com/
Before you buy.

Christopher Browne

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

On 19 Jan 2000 14:47:50 -0800, Russell Senior <sen...@aracnet.com>
concluded we would all be made wiser by knowing that:
>I'd be interested to hear knowledgable people speculate on the

>significance of the new Crusoe chip from Transmeta with its
>code-morphing technology as a host for a new lisp-machine.

That was one of my early thoughts. Having garbage collection support
code in "microcode" might provide a decent performance boost, as might
special instructions to support critical functions like CONS and other
low level utility components. [Hashing? ASSOC? Bignum operations?
Fixnum operations? Components for sequence operations?]

More importantly, this would provide a way of getting an upgrade path
on an architecture that others have an interest in improving.

Of course, if this means custom designing the whole Lisp-oriented
instruction set, that is still a hefty design proposition. The main
"win" is that upgrades should be cheaper than they were with a
completely custom chip like Ivory.
--
"The chat program is in public domain. This is not the GNU public
license. If it breaks then you get to keep both pieces."
(Copyright notice for the chat program)
cbbr...@ntlug.org- <http://www.ntlug.org/~cbbrowne/lsf.html>

Tim Bradshaw

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

* Joe Marshall wrote:

> A `read barrier' is necessary to implement `incremental scavenging', all the
> Lisp Machines had hardware to check for pointers to oldspace when reading

> from memory. Most implementations of Lisp on stock hardware simply punt
> assuming that `generational scavenging' will be fast enough. (There may
> be counter examples, but the ones I have first-hand knowledge about,
> do not implement a read barrier.)

And this is generally a correct decision except perhaps for
hard-real-time cases. My benchmarking of reasonably carefully-written
Lisp programs indicates that they spend under 10% of their time in GC.
So however much you increase GC performance it's only a 10% win.

> The type codes on stock hardware have to be contorted to `fit in'.
> Typically,
> the bottom 2 or three bits of a word are used to indicate whether the object
> is an immediate or a pointer. However, there aren't enough bits to describe
> the type of every pointer, so there are auxiliary type bits on the object
> itself.

There are *never* enough tag bits to describe the type of every
pointer, unless you're willing to blow an extra word on it! No LispM I
know of did this. If you do blow an extra word on it then you're
doomed because you've just doubled memory bandwidth.

And if you really want this then an ordinary processor can do it
anyway! A 2-way superscalar machine with enough memory bandwidth and
some speculative execution ought to be able to quite easily issue tag
checks in parallel with other operations on objects just by keeping
the tag information next to the pointer (or in the high bits of a
64bit pointer).

> The big problem with custom hardware is keeping up with Intel. I have a
> Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by

> the end of the year. A 1 GHz Pentium can execute a *lot* of instructions.

> It can afford to waste time boxing and unboxing bits and shuffling them here

> and there. It would probably compete with a 100MHz custom lisp processor.
> In two years, a 1GHz Pentium is going to look pathetically slow.

Well, that's just the RISC argument isn't it, except it turns out that
CISCy chips like x86 can win too if you throw enough money at them.
That's what killed the Vax.

--tim

Barry Margolin

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

In article <hmLh4.900$Rj5....@dfw-read.news.verio.net>,
Joe Marshall <jmar...@alum.mit.edu> wrote:

>The type codes on stock hardware have to be contorted to `fit in'.
>Typically,
>the bottom 2 or three bits of a word are used to indicate whether the object
>is an immediate or a pointer. However, there aren't enough bits to describe
>the type of every pointer, so there are auxiliary type bits on the object
>itself.

>This means chasing down a pointer whenever you need to do some operation

>dependent upon the objects type. (Cons cells are so important that one of

>the `primary' tags is used to represent them.)

Another type-related thing that Lisp hardware did made use of parallel data
paths. When you performed an arithmetic operation, the operands were sent
to the integer ALU, the floating point ALU, and the type checking microcode
simultaneously. The type checking microcode determined whether to gate the
result of one of the two ALUs to the output, or to invoke a microcode or
main CPU subroutine to perform higher-level processing (e.g. coercion of
one of the arguments, bignum arithmetic). This means that simple integer
or FP arithmetic runs at full speed, with no type checking overhead.

--
Barry Margolin, bar...@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Tim Bradshaw

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

* Barry Margolin wrote:

> Another type-related thing that Lisp hardware did made use of parallel data
> paths. When you performed an arithmetic operation, the operands were sent
> to the integer ALU, the floating point ALU, and the type checking microcode
> simultaneously. The type checking microcode determined whether to gate the
> result of one of the two ALUs to the output, or to invoke a microcode or
> main CPU subroutine to perform higher-level processing (e.g. coercion of
> one of the arguments, bignum arithmetic). This means that simple integer
> or FP arithmetic runs at full speed, with no type checking overhead.

As far as I can see this ought to be equally possible on a superscalar
CPU which can issue 2 operations at once and do some speculative
execution. At least, I'm unaware of anything that would prevent this
being done. You just need to arrange things so that one execution
unit is doing the type check while others are (speculatively)
executing the various good-case operations.

I'd be interested to know if real compilers do manage to arrange life
so this happens. I guess the advantage of a VLIW machine would be
that this is very obvious in the instructions rather than relying on
the processor to spot the parallelism dynamically.

This is also a good case of how nothing comes for free -- the `free'
parallel typechecking is costing an execution unit which you could
(with luck) keep busy doing something else if it wasn't typechecking.
I suppose a dedicated Lisp processor could use a much simpler unit do
do this, but it would then lose if a smart compiler managed to omit a
substantial number of the typechecks, because it might not be useful
for anything else so you'd have all these expensive transistors idle...

--tim

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Robert Monfera <mon...@fisec.com> wrote in message
news:38878AC7...@fisec.com...

>
> Joe Marshall wrote:
>
> > (Cons cells are so important that one of
> > the `primary' tags is used to represent them.)
>

> While not debating your point, I'd like to mention that it is not
> necessarily true.
>
> If you have an array of integers and you know all of them will be
> integers, one bit is enough to let the operator know whether the rest of
> the bits is a 31-bit fixnum or a pointer to a bignum. If it is a
> pointer, you even have at least one spare bit (as you address words, not
> bytes), which you could use to differentiate between 'mednums' (e.g., 64
> or 80 bit hardware or software integers or even floats with 50-something
> useful bits) and very bignums (arbitrary length beyond 64 or 80 bits).

I mistyped. I should have said "Cons cells are so important that
OFTEN one of the `primary' tags is used to represent them." I'm pretty
sure Lucid and Franz both did/do this, and I bet most of the other Lisp
implementations do so as well.

Of course, you can re-arrange or eliminate your tags in certain restricted
contexts (like in an array of integers). This will complicate the compiler,
which must prove that the code abides by the restriction, and it may
complicate the GC which must somehow be told of exceptional cases.
It is likely to make life more complicated for the developer as well because
without declarations, the compiler has little to work with.

I have seen a lot of cases where compilers can optimize floating point
arrays
(which take a nasty performance hit when boxing and unboxing), and even
the LispMs optimized packed integer arrays, but I haven't seen any system
that did something as complicated as you described above (not that you
couldn't, but that no one does.)

> (This can be done by the user of the language, who will thus become more
> of an implementor.)
>

> > A 1 GHz Pentium can execute a *lot* of instructions.
> > It can afford to waste time boxing and unboxing bits and shuffling them
here
> > and there.
>

> You are right: based on benchmarks, if the bits are already in a
> register, a couple of wasted shifting or masking instructions do not
> make a huge difference, as speed will be largely determined by how fast
> you can get data from memory to cache.
>
> This is why a fine resolution of compiler-optimizable types may be
> preferable over a custom chip - instead of pulling in 16-24 bytes for a
> bignum, maybe 8 bytes will do, and they could even be immediate values
> in a properly declared array.

There's no reason you couldn't combine compiler-optimizable types
with a custom chip. In fact, a custom chip could `automagically'
distinguish
between some of these types at runtime and optimize them on the fly.

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Tim Bradshaw <t...@cley.com> wrote in message

news:ey3ya9k...@cley.com...

> * Joe Marshall wrote:
>
> > A `read barrier' is necessary to implement `incremental scavenging', all
the
> > Lisp Machines had hardware to check for pointers to oldspace when
reading
> > from memory. Most implementations of Lisp on stock hardware simply punt
> > assuming that `generational scavenging' will be fast enough. (There may
> > be counter examples, but the ones I have first-hand knowledge about,
> > do not implement a read barrier.)
>
> And this is generally a correct decision except perhaps for
> hard-real-time cases. My benchmarking of reasonably carefully-written
> Lisp programs indicates that they spend under 10% of their time in GC.
> So however much you increase GC performance it's only a 10% win.

The `read barrier' generally applies to *every* read the system does, not
just the ones during GC. A lot of stock hardware systems use the page
fault hardware as a `write barrier' accelerator to avoid taking a hit on
every write.

> > The type codes on stock hardware have to be contorted to `fit in'.
> > Typically,
> > the bottom 2 or three bits of a word are used to indicate whether the
object
> > is an immediate or a pointer. However, there aren't enough bits to
describe
> > the type of every pointer, so there are auxiliary type bits on the
object
> > itself.
>

> There are *never* enough tag bits to describe the type of every
> pointer, unless you're willing to blow an extra word on it! No LispM I
> know of did this. If you do blow an extra word on it then you're
> doomed because you've just doubled memory bandwidth.

That's true, but you don't need to accelerate very type of pointer.
Being able to distinguish between about a dozen types without
dereferencing is where the LispM stood (5 or 6 bits of type info).
On a 32-bit machine, you either have to move some of the type info
to the manifest object (or a bibop table), or discard parts of your
address space. The latter solution has the additional problem that
the processor will be spending a lot of time extracting tags when it
could be doing `useful' work.

> And if you really want this then an ordinary processor can do it
> anyway! A 2-way superscalar machine with enough memory bandwidth and
> some speculative execution ought to be able to quite easily issue tag
> checks in parallel with other operations on objects just by keeping
> the tag information next to the pointer (or in the high bits of a
> 64bit pointer).

That's ordinary?

What I'm arguing is that if you built a 2-way superscalar machine with
a fat memory pipe and some speculative execution, *and* made sure
that there was extra hardware support for doing the type of things that
Lisp does, that you will outperform the same sort of architecture that
doesn't have Lisp support.

Of course, if you have a clever enough chip, wide enough datapaths
some spare bits in the word, some parallel type processing, etc. in
your `ordinary' processor, then sure, you have a Lisp processor that
isn't marketed as one.

However I don't think any of the x86 chips fall into this category.

>
> > The big problem with custom hardware is keeping up with Intel. I have a
> > Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by

> > the end of the year. A 1 GHz Pentium can execute a *lot* of

instructions.
> > It can afford to waste time boxing and unboxing bits and shuffling them
here

> > and there. It would probably compete with a 100MHz custom lisp
processor.
> > In two years, a 1GHz Pentium is going to look pathetically slow.
>
> Well, that's just the RISC argument isn't it, except it turns out that
> CISCy chips like x86 can win too if you throw enough money at them.
> That's what killed the Vax.

It's Moore's law meets money. Any processor, CISC or RISC, will outperform
an instance of the other in a few years. All you have to do is keep people
from switching to your competition until you catch up to them.

~jrm

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Andy Freeman <ana...@earthlink.net> wrote in message
news:868csb$1i6$1...@nnrp1.deja.com...
> In article <hmLh4.900$Rj5....@dfw-read.news.verio.net>,

> "Joe Marshall" <jmar...@alum.mit.edu> wrote:
> > A `read barrier' is necessary to implement `incremental scavenging',
> all the
> > Lisp Machines had hardware to check for pointers to oldspace when
> reading
> > from memory.
>

> I can't think of a 32 bit processor designed since the 68010 which
> doesn't have hardware which can be used to implement a read barrier.
> Even the processor in Sega's previous generation game box (a
> processor which is also used in cell phones) has the relevant
> hardware. The problem has been OS support, not HW support.

I know how to trick the MMU to do a write barrier, but I'm unfamiliar
with how to get a read barrier. You need to cause a trap when
certain bit patterns are read (those indicating a pointer type, and
those indicating that the address pointed to is in oldspace).
Could you elaborate?

> Sparc has tag support designed with input from lisp folk - I don't
> know whether it actually helped.

The Sparc had the ability to treat the bottom two bits as a tag during
integer arithmetic. I don't recall any other acceleration.

> (Hmm - I'll bet that you can
> use an MMU to do much of the tag stuff, and I'm sure that I'm not
> not the first to notice that.)

The LispM read and write barriers were part of the LispM MMU.

> > The LMI K-machine would have competed favorably with a Pentium I.
> However
> > the K-machine was built in 1986 and was made almost completely in
> stock
> > TTL.
>
> At the LMI's price point, there were lots of things in 86 which would
> have competed with a Pentium I, which wasn't all that interesting.
> (The PI is the first dual issue x86, but it's completely static,
> like a 486. The world changed with PPro, which started today's
> "everything is dynamic" trend in x86 land.) For example, the
> 360/91 microarchitecture (from the early 70s) had fewer gates
> than any of the Lispms and given the lispms implementation
> technology, would have really screamed.

My point was that with the architecture tuned to the problem, you can
run damn fast.

> The processor world was
> very strange in the 80s.

That's for sure.

~jrm

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Tim Bradshaw <t...@cley.com> wrote in message

news:ey3puuv...@cley.com...

>
> This is also a good case of how nothing comes for free -- the `free'
> parallel typechecking is costing an execution unit which you could
> (with luck) keep busy doing something else if it wasn't typechecking.
> I suppose a dedicated Lisp processor could use a much simpler unit do
> do this, but it would then lose if a smart compiler managed to omit a
> substantial number of the typechecks, because it might not be useful
> for anything else so you'd have all these expensive transistors idle...

For the K machine, the type checking execution unit was a RAM. Quite
a bit simpler than a floating point unit.

I've never seen a `smart' compiler that didn't need clues from a `smart'
programmer.

Raymond Toy

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

>>>>> "Andy" == Andy Freeman <ana...@earthlink.net> writes:

Andy> Sparc has tag support designed with input from lisp folk - I don't
Andy> know whether it actually helped. (Hmm - I'll bet that you can

CMUCL uses the Sparc tag support. However, it seems that it's only
used in two ways: (1) verify a pointer is a fixnum (mostly for safe
code), (2) checking that a fixnum addition or subtraction still fits in
a fixnum or make a bignum out of the result. (There is one other use
of the tag support that I don't quite understand.)

Given CMUCL's good type inference (2) is removed if the result is
known to be a fixnum.

My gut feel is that if all of these were replaced with "normal"
instructions, you wouldn't see much difference in performance.

Ray

Tim Bradshaw

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

* Joe Marshall wrote:

> For the K machine, the type checking execution unit was a RAM. Quite
> a bit simpler than a floating point unit.

But to be any use that ram must be performing at register speeds. so
it needs to be basically level-1 cache or better (I think typical L1
caches are have slightly longer latencies than registers). So you're
just directly trading this stuff against cache. I'll take cache!

--tim

Tim Bradshaw

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

* Joe Marshall wrote:

>> And if you really want this then an ordinary processor can do it
>> anyway! A 2-way superscalar machine with enough memory bandwidth and
>> some speculative execution ought to be able to quite easily issue tag
>> checks in parallel with other operations on objects just by keeping
>> the tag information next to the pointer (or in the high bits of a
>> 64bit pointer).

> That's ordinary?

Absolutely. The original pentium was 2 way superscalar. I think most
modern processors are 4 or 8 way superscalar. For instance the sparc
IIi (not a particularly fast processor, but I found the info quickly
is described as:

4-way SuperScalar Design with 9 Execution Units
- 4 Integer Execution Units
- 3 Floating-Point Execution Units
- 2 Graphics Execution Units

--tim

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Tim Bradshaw <t...@cley.com> wrote in message

news:ey3wvp3...@cley.com...

Actually it has to be no slower than the ALU stage of the pipeline.
The register speed has to be twice as fast if it supports a read/write
cycle for every operation.

Our ALU was the amd 29332. Although it was supposed to have a 30ns
max time from stable input to stable output, the integer overflow bit
didn't stabilize until 47ns after stable input. The RAM was a 4K by
1 bit and it had no problem coming up with a go/no-go by then.

>
> --tim

Joe Marshall

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Tim Bradshaw <t...@cley.com> wrote in message

news:ey3u2k7...@cley.com...

Actually I was objecting to the extra 32-bits of tag information in the
high bits of the 64-bit pointer.

It is true that some of the stock hardware out there has some support that
could be co-opted for Lisp. But I still believe that *designing* support
for
Lisp will get you further. Suppose we punted on one of the `graphics
execution units' and instead built a `function calling unit' or a `generic
function dispatch unit'. If we tailor the hardware to the problem, we'll
get
better performance.

The K machine had 3-bits of type-check opcode in every instruction. A
x86 simply doesn't have that, so you'd have to add a fair amount of `bloat'
to the instruction stream to accomplish the same thing with the stock
hardware.

Harley Davis

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

Joe Marshall <jmar...@alum.mit.edu> wrote in message
news:hmLh4.900$Rj5....@dfw-read.news.verio.net...

> The big problem with custom hardware is keeping up with Intel. I have a
> Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by
> the end of the year. A 1 GHz Pentium can execute a *lot* of instructions.
> It can afford to waste time boxing and unboxing bits and shuffling them
here
> and there. It would probably compete with a 100MHz custom lisp processor.
> In two years, a 1GHz Pentium is going to look pathetically slow.
>

> The second big problem is you can't play Doom on a Lisp machine.

#2 is really the big issue in terms of making custom Lisp hardware an
economically viable enterprise.

-- Harley

Chuck Fry

unread,

Jan 21, 2000, 3:00:00 AM1/21/00

to

In article <f62i4.1671$Rj5....@dfw-read.news.verio.net>,

Joe Marshall <jmar...@alum.mit.edu> wrote:
>
>Actually I was objecting to the extra 32-bits of tag information in the
>high bits of the 64-bit pointer.

Why should object references on a 64-bit machine contain only 32 bits of
address?

>It is true that some of the stock hardware out there has some support that
>could be co-opted for Lisp. But I still believe that *designing* support
>for
>Lisp will get you further. Suppose we punted on one of the `graphics
>execution units' and instead built a `function calling unit' or a `generic
>function dispatch unit'. If we tailor the hardware to the problem, we'll
>get
>better performance.

Or we could meet in the middle. Not all of the operations done in a
Lispm needed exotic hardware support. You could get a lot of bang for
the buck with relatively simple GC support hardware, for instance. I
seem to recall Moon and others (Cyphers?) at Symbolics wrote a paper on
the topic of minimal hardware support for Lisp around the time the Ivory
architecture was devised.

The idea of using some of the inherent parallelism in a superscalar or
VLIW architecture to compensate for the relatively slow memory also
appeals very strongly to me. Pointer-chasing is inherently limited by
memory latency, so (instruction set architecture permitting) why not
launch the memory read early, and do a few extra checks while you're
waiting for the data to come back?

Or (heresy alert!) it might be worth redesiging internal representations
to better deal with the memory latency and caching issues. Today's
implementations do a fair job of this, but techniques like CDR-coding
might bear a second look.

You mention a "generic function dispatch unit". Maybe what's needed is
a less generic object system! There are a host of representational and
algorithmic tradeoffs that could be made.

>The K machine had 3-bits of type-check opcode in every instruction. A
>x86 simply doesn't have that, so you'd have to add a fair amount of `bloat'
>to the instruction stream to accomplish the same thing with the stock
>hardware.

Do you really need the type-checking for *every* instruction? I would
wager you don't.

-- Chuck, computer/language architect wannabe
--
Chuck Fry -- Jack of all trades, master of none
chu...@chucko.com (text only please) chuc...@home.com (MIME enabled)
Lisp bigot, car nut, photographer, sometime guitarist and mountain biker
The addresses above are real. All spammers will be reported to their ISPs.

Tim Bradshaw

unread,

Jan 22, 2000, 3:00:00 AM1/22/00

to

* Joe Marshall wrote:
> The big problem with custom hardware is keeping up with Intel. I have a
> Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by
> the end of the year. A 1 GHz Pentium can execute a *lot* of instructions.
> It can afford to waste time boxing and unboxing bits and shuffling them here
> and there. It would probably compete with a 100MHz custom lisp processor.
> In two years, a 1GHz Pentium is going to look pathetically slow.

I've been feeling vaguely uncomfortable about this claim, and I think
I now know why.

I need to make some simplifying assumptions. These are all bogus but
they're commonly made, and perhaps are true to first order.

1. Assume that it's possible to talk about 'language x running
so-many times as fast as language y on given hardware'.
Slightly less bogusly this could be taken to mean `for the
best (possible?) implementation of languages x and y and
for equivalent programs'.

2. Assume that performance of a given processor goes like
clock speed. So assume that the memory system on a 1GHz
machine is 10x as fast as a 100MHz implementation and so
on. This is obviously extremely questionable as well, but
it's what is widely believed.

OK, so what you are claiming (rephrased) is:

A a 100MHz lisp processor running a lisp program competes with
a 1GHz pentium running the same lisp program.

Or (using (2) above): a lisp processor can run lisp about 10x
as fast as a pentium.

Another assumption I need to make (based on (1) above and experimental
evidence):

3. Current lisp implementations on stock hardware (including
pentiums) run about 1/2 the speed of C implementations on
the same hardware.

One could dispute the 1/2, but it's somewhere around that figure. In
my experience it's actually rather closer to 1 -- 1/2 is a rather
pessimistic estimate I think.

Final assumption:

4. We can clock a lisp chip at the same speed as a pentium for
about the same expenditure of money on development, and
that the resulting chip will not be ludicrously expensive
-- so will have about the same transistor count, not require
boutique implementation technologies or exotic memory
interfaces (I don't count non-power-of-two word sizes as
`exotic').

If this isn't true then the lisp chip can obviously never win.

OK, from all this I get:

We can design a lisp processor which will run Lisp about 5x as
fast as a pentium will run C, at the same clock rate. And if
the market was equivalently large, this chip would cost about
the same as a pentium.

This is a conclusion that I find very uncomfortable indeed. As far as
I can see this means either that, in some deep sense, Lisp is a
language which can be made to go much faster than C, or that in fact
the pentium is a very poor `C machine', and much faster C machines
could be designed. Neither option seems very likely to me.

--tim

Paul Wallich

unread,

Jan 22, 2000, 3:00:00 AM1/22/00

to

In article <ey3g0vq...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

>* Joe Marshall wrote:
>> The big problem with custom hardware is keeping up with Intel. I have a
>> Pentium II clocking at 200 MHz and I hear that we'll have 1GHz chips by
>> the end of the year. A 1 GHz Pentium can execute a *lot* of instructions.
>> It can afford to waste time boxing and unboxing bits and shuffling them here
>> and there. It would probably compete with a 100MHz custom lisp processor.
>> In two years, a 1GHz Pentium is going to look pathetically slow.
>
>I've been feeling vaguely uncomfortable about this claim, and I think
>I now know why.
>
>I need to make some simplifying assumptions. These are all bogus but
>they're commonly made, and perhaps are true to first order.
>
> 1. Assume that it's possible to talk about 'language x running
> so-many times as fast as language y on given hardware'.
> Slightly less bogusly this could be taken to mean `for the
> best (possible?) implementation of languages x and y and
> for equivalent programs'.

"Equivalent programs" is an awfully difficult one -- there's a big different
between "the exact same code, transliterated" and "equivalent algorithms"
and "equivalent functionality". One of the things that a lot of people have
said is that their old Lispm's are _still_ more responsive than stock hardware
boxes even when various thing-crunching benchmarks show the Lispm to be
slower at chugging code.

> 2. Assume that performance of a given processor goes like
> clock speed. So assume that the memory system on a 1GHz
> machine is 10x as fast as a 100MHz implementation and so
> on. This is obviously extremely questionable as well, but
> it's what is widely believed.

This is one of the things that could easily cost you a factor of 5...

>OK, so what you are claiming (rephrased) is:
>
> A a 100MHz lisp processor running a lisp program competes with
> a 1GHz pentium running the same lisp program.
>
> Or (using (2) above): a lisp processor can run lisp about 10x
> as fast as a pentium.

See the caveat above, in particular noting that the lisp processor
will theoretically be running a LispOS.

>Another assumption I need to make (based on (1) above and experimental
>evidence):
>
> 3. Current lisp implementations on stock hardware (including
> pentiums) run about 1/2 the speed of C implementations on
> the same hardware.
>
>One could dispute the 1/2, but it's somewhere around that figure. In
>my experience it's actually rather closer to 1 -- 1/2 is a rather
>pessimistic estimate I think.
>
>Final assumption:
>
> 4. We can clock a lisp chip at the same speed as a pentium for
> about the same expenditure of money on development, and
> that the resulting chip will not be ludicrously expensive
> -- so will have about the same transistor count, not require
> boutique implementation technologies or exotic memory
> interfaces (I don't count non-power-of-two word sizes as
> `exotic').
>
>If this isn't true then the lisp chip can obviously never win.

This is another assumption that I think is fundamentally flawed --
if you're going to wider words or extra functional units or whatever,
either your transistor budget will go up or your speed will go down
or both. And if you trade off "lispy" functional units in the processor
vs "pentium-ish" ones, you will lower performance as well. So figure
more like a factor of two here. You will be getting more done (fsvo "more)
in each clock, but I doubt you'll get the same clock rate.

>OK, from all this I get:
>
> We can design a lisp processor which will run Lisp about 5x as
> fast as a pentium will run C, at the same clock rate. And if
> the market was equivalently large, this chip would cost about
> the same as a pentium.
>
>This is a conclusion that I find very uncomfortable indeed. As far as
>I can see this means either that, in some deep sense, Lisp is a
>language which can be made to go much faster than C, or that in fact
>the pentium is a very poor `C machine', and much faster C machines
>could be designed. Neither option seems very likely to me.

Both of these are in some sense true, especially if you play with the
meanings of the terms a little. For some static definition of "C code"
there are certainly already chips with smaller transistor budgets than
the pentium that blow its socks off, either in a straight SPECmark
sense or in performance vs area/power consumption/etc. The pentium
has the design constraint of running legacy code under a particularly
crufty OS, and although its designers have done quite well with the
billions of dollars they've thrown at the problem, it's nowhere near optimal.

Meanwhile, what the creaking survival of the old Lispm's shows is that
for certain kinds of tasks, Lisp (and especially a LispOS) can run much
faster than C, because it executes much shorter instruction sequences
to perform the same user-level function. You're not constantly throwing
information away and then spending valuable machine time re-extracting
it from opaque sources. You might be able to take any particular static
chunk of Lisp code and translate it into equivalent C, but it's unlikely
that you could build an entire functional equivalent in C from scratch.
It's a little like Fred Brook's bit about how adding people to a late project
makes it even later.

paul

Tim Bradshaw

unread,

Jan 22, 2000, 3:00:00 AM1/22/00

to

I think I'm going to give up after this, but.

* Paul Wallich wrote:

> "Equivalent programs" is an awfully difficult one -- there's a big different
> between "the exact same code, transliterated" and "equivalent algorithms"
> and "equivalent functionality". One of the things that a lot of people have
> said is that their old Lispm's are _still_ more responsive than stock hardware
> boxes even when various thing-crunching benchmarks show the Lispm to be
> slower at chugging code.

They're `more responsive' because they have a better user interface.
They are *not* more responsive because they are in any sense faster,
and if anyone claims that they're not telling the truth. I have an
Symbolics NXP1000 sitting right here, and believe me it is *not*
faster than my Sun, not even close. It's not even faster than the IPX
(40MHz SPARC?) we use as an X terminal. I can't test, but it's
probably not as fast as a late 68K Sun.

> Meanwhile, what the creaking survival of the old Lispm's shows is that
> for certain kinds of tasks, Lisp (and especially a LispOS) can run much
> faster than C, because it executes much shorter instruction sequences
> to perform the same user-level function.

No, it doesn't show that. It shows that (a) the environment is very
good, and (b) there is a bunch of legacy code which is significantly
hard to port.

--tim

Paul Wallich

unread,

Jan 22, 2000, 3:00:00 AM1/22/00

to

In article <ey3bt6d...@cley.com>, Tim Bradshaw <t...@cley.com> wrote:

>I think I'm going to give up after this, but.
>
>* Paul Wallich wrote:
>

>> "Equivalent programs" is an awfully difficult one -- there's a big different
>> between "the exact same code, transliterated" and "equivalent algorithms"
>> and "equivalent functionality". One of the things that a lot of people have
>> said is that their old Lispm's are _still_ more responsive than stock
hardware
>> boxes even when various thing-crunching benchmarks show the Lispm to be
>> slower at chugging code.
>

>They're `more responsive' because they have a better user interface.
>They are *not* more responsive because they are in any sense faster,
>and if anyone claims that they're not telling the truth. I have an
>Symbolics NXP1000 sitting right here, and believe me it is *not*
>faster than my Sun, not even close. It's not even faster than the IPX
>(40MHz SPARC?) we use as an X terminal. I can't test, but it's
>probably not as fast as a late 68K Sun.

Here is where, Imo, things get sticky. If something is more responsive,
that means work gets done faster. So even if the lispm's aren't faster at
executing instructions, they're faster at _something_. I agree with you
completely that that something is not executing code as sensibly defined.

>> Meanwhile, what the creaking survival of the old Lispm's shows is that
>> for certain kinds of tasks, Lisp (and especially a LispOS) can run much
>> faster than C, because it executes much shorter instruction sequences
>> to perform the same user-level function.
>

>No, it doesn't show that. It shows that (a) the environment is very
>good, and (b) there is a bunch of legacy code which is significantly
>hard to port.

Why is that code so hard to port? You would think that with 100x the
number of cycles to play with, it would be almost impossible to _not_
figure out a way to port it. (I know, it's an OS and UI and low-level code
thing, but isn't that sort of the point?)

It may be that when people are talking about a modern-custom lispm
being much faster than a pentium, they are talking about (perlis's term)
"discrete exportable functionality" rather than mips. Taking that as an
assumption, what would it mean to have a box with (pessimizing here)
25-50X the raw cpu of a lispm, with memory, graphics and disk bandwidth
ditto, and the environment of a lispm with the additions that that kind of
excess power and memory would make possible.

There's an almost endless list of algorithms and techniques that died during
AI winter that would do very nicely with 50x or so more computing. (The
entire field of speech recognition, for example, which is now apparently
viable for dication on the desktop, is those same 1985 algorithms, maybe 1990
at the outside, with 20x the cycles and 20x the memory.)

paul

Tim Bradshaw

unread,

Jan 23, 2000, 3:00:00 AM1/23/00

to

* Paul Wallich wrote:

> Here is where, Imo, things get sticky. If something is more responsive,
> that means work gets done faster. So even if the lispm's aren't faster at
> executing instructions, they're faster at _something_. I agree with you
> completely that that something is not executing code as sensibly
> defined.

They are faster at letting you write and develop code (so long as that
code does not have runtimes long enough when developing you wait
around all the time). That kind of `being fast' is nothing to do with
CPU performance.

> Why is that code so hard to port? You would think that with 100x the
> number of cycles to play with, it would be almost impossible to _not_
> figure out a way to port it. (I know, it's an OS and UI and low-level code
> thing, but isn't that sort of the point?)

Because there's oodles of proprietary lispm stuff which it relies on
which simply is not available on stock hardware (unless you count the
alpha ivory emulator). The fact that it's not available on stick
hardware is nothing to do with performance of the processor it's to do
with intellectual property and it being very hard to port off the old
system because it wasn't particularly written with portability in
mind.

The reason it's not being ported is the lack of human resources to
port it. (Of course, most of it has been ported or has died).

> It may be that when people are talking about a modern-custom lispm
> being much faster than a pentium, they are talking about (perlis's term)
> "discrete exportable functionality" rather than mips. Taking that as an
> assumption, what would it mean to have a box with (pessimizing here)
> 25-50X the raw cpu of a lispm, with memory, graphics and disk bandwidth
> ditto, and the environment of a lispm with the additions that that kind of
> excess power and memory would make possible.

But having that is nothing to do with hardware. That whole
environment could live on top of any stock-hardware lisp system. I
mean, CLIM already runs on stock HW and is already *significantly*
faster than the symbolics window system was on proprietary HW. What it
requires is someone to invest the necessary millions of dollars that
were invested in the lispm *software* into a new lot of software which
has all this cool functionality, and is brought up to date as well
(web and so on). All of this stuff could live on top of X or Windows
if need be. You don't need some mythical blindingly fast lisp chip to
do that, you just need to type maybe a couple of million lines of code
or something.

--tim

Espen Vestre

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

Tim Bradshaw <t...@cley.com> writes:

> and if anyone claims that they're not telling the truth. I have an
> Symbolics NXP1000 sitting right here, and believe me it is *not*
> faster than my Sun, not even close. It's not even faster than the IPX
> (40MHz SPARC?) we use as an X terminal. I can't test, but it's
> probably not as fast as a late 68K Sun.

probably about the same speed, I remember that I once tested some
software on a MacIvory and the Mac IIfx (a fast 68030) the MacIvory
was sitting in, running MCL, and the speed was roughly equivalent
as far as I recall (at that time, MCL had some very obvious weak
points, though, e.g. it didn't yet have lightweight processes).
--
(espen)

Erik Naggum

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

* Tim Bradshaw <t...@cley.com>

| OK, from all this I get:
|
| We can design a lisp processor which will run Lisp about 5x as fast
| as a pentium will run C, at the same clock rate. And if the market
| was equivalently large, this chip would cost about the same as a
| pentium.
|
| This is a conclusion that I find very uncomfortable indeed. As far as I
| can see this means either that, in some deep sense, Lisp is a language
| which can be made to go much faster than C, or that in fact the pentium
| is a very poor `C machine', and much faster C machines could be designed.
| Neither option seems very likely to me.

once upon a time, when ANSI C was fairly young and compilers for the
language were new enough to be simple enough for me to understand, I
spent a fair amount of time studying then-current processors and how they
would optimally "represent" C. it turns out that C was a good language
back when PDP-11 was a good processor. C has sucked since then, more and
more so over the years. today, C is probably the crappiest language you
can possibly find, made worse only by its "improvement", C++. neither
has a snowball's chance in hell of being compiled into _optimal_ machine
code on modern processors. the language itself is so low-level it tries
to run on a "virtual" processor that only to the untraied eye looks like
modern processors.

I recently had the opportunity to study the Pentium III from the inside,
and I was positively amazed by what the Intel guys have managed to hide
behind that god-awful instruction set they have to be compatible with.
it's good stuff, and it isn't being used because people are still stuck
on the 386 and refuse to compile programs for 686s.

if you actually _used_ the Pentium III architecture for Lisp, you'd truly
run rings around C, but since Lisp vendors, too, tend to stay with what
is likely to sell well, you can't ask people to upgrade to 600MHz PIIIs
just to get that super performance over C when C is doing "adequately" on
a 133MHz PPro. this _actually_ means that the performance gain for C is
much smaller than the performance gain for Lisp, it is taken to mean that
the _processors_ wouldn't provide that much performance gain for Lisp,
either, considering the mind-bogglingly uintelligent conflation of C with
"optimal hardware performance".

C and C++ are firmly stuck in the early 80's as far as processor design
is concerned. Common Lisp never was stuck anywhere in the first place,
which means it has had a hard time sticking to something in particular,
as well, but at least it has the uncommon ability to let go.

the task of good programmers everywhere is to get rid of C, or, more
precisely, the C mindset and the gravely mistaken notion that C is
somehow "inherently efficient".

#:Erik

Michael Hudson

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

Erik Naggum <er...@naggum.no> writes:

[snip C gripes]

> I recently had the opportunity to study the Pentium III from the inside,
> and I was positively amazed by what the Intel guys have managed to hide
> behind that god-awful instruction set they have to be compatible with.
> it's good stuff, and it isn't being used because people are still stuck
> on the 386 and refuse to compile programs for 686s.
>
> if you actually _used_ the Pentium III architecture for Lisp,

Can you do this? I was of the understanding (which shouldn't be
thought of as particularly deep) that you couldn't program to the p3
core directly - which leads to a silly game of compiler writers
thinking "this is the most optimal set of low level ops that I'd like
the p3 core to execute, so I'll compile this code to this stream of
x86 opcodes, which will in due course be decoded into something like
what I want." Which is wasteful of everyone's time, both the compiler
writer's and the user's ('cause there's silicon that could be doing
useful stuff doing the decoding).

It will be interesting to me to see what happens with the IA-64, which
(hopefully) will have a sane ISA and makes no concessions to poorly
compiled code (again, as I understand it).

> you'd truly run rings around C, but since Lisp vendors, too, tend
> to stay with what is likely to sell well, you can't ask people to
> upgrade to 600MHz PIIIs just to get that super performance over C
> when C is doing "adequately" on a 133MHz PPro. this _actually_
> means that the performance gain for C is much smaller than the
> performance gain for Lisp, it is taken to mean that the
> _processors_ wouldn't provide that much performance gain for Lisp,
> either, considering the mind-bogglingly uintelligent conflation of
> C with "optimal hardware performance".
>
> C and C++ are firmly stuck in the early 80's as far as processor design
> is concerned. Common Lisp never was stuck anywhere in the first place,
> which means it has had a hard time sticking to something in particular,
> as well, but at least it has the uncommon ability to let go.

Does this mean you think CL will/should out perform C on a processor
that puts a less demented face to the world (eg. G4, IA-64)? That'd be
nice.

> the task of good programmers everywhere is to get rid of C, or, more
> precisely, the C mindset and the gravely mistaken notion that C is
> somehow "inherently efficient".

Just getting rid of languages that don't have GC would please me no
end (I spent the summer working in C++, and spent many a happy hour
chashing obscure leaks and bad pointer dereferences).

Cheers,
Michael

Barry Margolin

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

In article <31577258...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
> I recently had the opportunity to study the Pentium III from the inside,
> and I was positively amazed by what the Intel guys have managed to hide
> behind that god-awful instruction set they have to be compatible with.
> it's good stuff, and it isn't being used because people are still stuck
> on the 386 and refuse to compile programs for 686s.
>

> if you actually _used_ the Pentium III architecture for Lisp, you'd truly

> run rings around C, but since Lisp vendors, too, tend to stay with what
> is likely to sell well, you can't ask people to upgrade to 600MHz PIIIs
> just to get that super performance over C when C is doing "adequately" on
> a 133MHz PPro.

I'm not familiar with the Pentium III architecture (I haven't really
studied an instruction set since the Z80). Could you give some examples of
the features you're talking about? Are these things like the MMX
instructions (those were in the Pentium II, as well, weren't they)? I know
it has lots of bit-twiddling instructions that are supposed to be useful
for graphics, and I think some DSP-related instructions, but neither of
these seem like they would be of much assistance in Lisp.

How do other applications make use of these features? Isn't there some way
to use the instructions and trap into macrocode when running on an older
processor? Is the performance hit of this trap severe enough that it would
make the old processors too slow (customers don't like it when a new
release of a program is slower than the old one)?

Robert Monfera

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

Barry Margolin wrote:

> Are these things like the MMX
> instructions (those were in the Pentium II, as well, weren't they)?

These are just my thoughts, I am interested in Erik's examples as well.

There are some specific things that would be usable, for example, 80-bit
integers or floats.

Here's a quote from Intel's site introducing MMX this way:

[...]
SIMD: the Single Instruction Multiple Data (SIMD) process allows
the chip to reduce multiple compute-intensive loops common with
video, audio, graphics and animation. It's like a drill sergeant
telling an entire platoon, "About face," rather than commanding
each individual soldier one at a time.

There are 128 bit registers, rounding modes, bit operations, many
possibly useful instructions...

I don't think multimedia is so special. Working on geometry or working
on generic CPU-intensive vector calculations have a lot in common.
Bit-reverse and several other instructions may be special, but many
instructions are as general-purpose as it can get.

Of course, Intel will design for and brag about whatever is relevant for
the hot topic of the day (multimedia, internet, 3D games), but quite
possibly these developments also result a lisp-friendlier processor that
is ready to be exploited.

Regards
Robert

Carl Shapiro

unread,

Jan 24, 2000, 3:00:00 AM1/24/00

to

Barry Margolin <bar...@bbnplanet.com> writes:

[...]

> I'm not familiar with the Pentium III architecture (I haven't really
> studied an instruction set since the Z80). Could you give some examples of

> the features you're talking about? Are these things like the MMX

> instructions (those were in the Pentium II, as well, weren't they)?

I believe he is referring to the the uOP's (Micro OP's) which the the
Pentium Pro and Pentium II/III instruction set is actually implemented
on top of. Modern Intel x86 microprocessors decode the x86
instructions into the RISC-like uOP's which can then be executed
out-of-order and do other such nice things. I do not believe there is
a documented way to execute uOP's directly.

The AMD/NexGen x86 compatible designs employ a conceptually similar
scheme using what they call "ROP's" (RISC OP's).

Andy Freeman

unread,

Jan 25, 2000, 3:00:00 AM1/25/00

to

In article <m3iu0jq...@atrus.jesus.cam.ac.uk>,

Michael Hudson <mw...@cam.ac.uk> wrote:
> It will be interesting to me to see what happens with the IA-64, which
> (hopefully) will have a sane ISA and makes no concessions to poorly
> compiled code (again, as I understand it).

Intel has released rather detailed descriptions of the IA-64 ISA.
The "Application Developer's Architecture Guide" is order #
2451888-001. The US toll-free number for Intel literature is
800-548-4725 and supposedly there's a way to order Intel
literature through the web site - www.intel.com. (I'd be
surprised if they charged for these docs in any country which
has a local sales office.)

The IA-64 ISA does seem to demand a complex compiler merely
to generate reasonable code. I don't understand it well enough
to know when that complexity generates significant performance
gains, ie, good code. I'd guess that it's easy to misuse some
of the "performance enhancing features" and it's hard to ignore
some of them.

I certainly can't say whether it is better for lisp than C, but
I'd guess that at least a few of the performance enhancing features
simply don't help with lisp and might help with C and that fewer
give lisp a disproportionate advantage.

-andy

Sent via Deja.com http://www.deja.com/
Before you buy.

Christopher Browne

unread,

Jan 25, 2000, 3:00:00 AM1/25/00

to

Centuries ago, Nostradamus foresaw a time when Tim Bradshaw would say:

>* Paul Wallich wrote:
>> It may be that when people are talking about a modern-custom lispm
>> being much faster than a pentium, they are talking about (perlis's term)
>> "discrete exportable functionality" rather than mips. Taking that as an
>> assumption, what would it mean to have a box with (pessimizing here)
>> 25-50X the raw cpu of a lispm, with memory, graphics and disk bandwidth
>> ditto, and the environment of a lispm with the additions that that kind of
>> excess power and memory would make possible.
>
>But having that is nothing to do with hardware. That whole
>environment could live on top of any stock-hardware lisp system. I
>mean, CLIM already runs on stock HW and is already *significantly*
>faster than the symbolics window system was on proprietary HW. What it
>requires is someone to invest the necessary millions of dollars that
>were invested in the lispm *software* into a new lot of software which
>has all this cool functionality, and is brought up to date as well
>(web and so on). All of this stuff could live on top of X or Windows
>if need be. You don't need some mythical blindingly fast lisp chip to
>do that, you just need to type maybe a couple of million lines of code
>or something.

Unfortunately, it appears that there has not been the economic
incentive to either:

a) Deploy newer "blindingly fast" Lisp hardware, or
b) Rewrite the code for newer hardware.

Note that deploying newer Lisp Machine hardware a) would probably
mandate also rewriting the code *again.*

Which would encourage doing a fresh "hardware selection," which would
encourage considering a plethora of "canned" options such as:
- Alpha, which provides 64 bits of address space, permitting "wasting"
a few bits on data typing information;
- IA-64, with some similar possibilities, although I seem to recall
seeing Gartner Group reports that suggested that it would be bad
at highly dynamic Lisp-like "stuff;"
- Something like Transmeta;
- Perhaps creating custom-from-scratch hardware, albeit with
considerable design costs.

Simply going with IA-32 would seem to me to be worth *not*
considering, as it seems to me that if IA-64 does *anything* in the
marketplace, Intel is likely to try to "bury" IA-32 in favor of
IA-64...
--
Rules of the Evil Overlord #42. "I will see to it that plucky young
lads/lasses in strange clothes and with the accent of an outlander
shall REGULARLY climb some monument in the main square of my capital
and denounce me, claim to know the secret of my power, rally the
masses to rebellion, etc. That way, the citizens will be jaded in case
the real thing ever comes along."
<http://www.eviloverlord.com/lists/overlord.html>
cbbr...@hex.net- <http://www.hex.net/~cbbrowne/lisp.html>

Paolo Amoroso

unread,

Jan 25, 2000, 3:00:00 AM1/25/00

to

On Mon, 24 Jan 2000 18:37:46 GMT, Barry Margolin <bar...@bbnplanet.com>
wrote:

> studied an instruction set since the Z80). Could you give some examples of
> the features you're talking about? Are these things like the MMX

> instructions (those were in the Pentium II, as well, weren't they)? I know

Some time ago I posted a similar question to comp.lang.lisp and I got an
interesting--as usual--reply from Duane Rettig. The subject of the post was
"Re: LISP to C vs machine level (was: LISP for Windows)", its date 19 Dec
1998. If you have difficulties retrieving the article with Deja I can send
a copy to you.

Paolo
--
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/

Tim Bradshaw

unread,

Jan 25, 2000, 3:00:00 AM1/25/00

to

* Christopher Browne wrote:

> a) Deploy newer "blindingly fast" Lisp hardware, or
> b) Rewrite the code for newer hardware.

I think this slightly misses my point -- though less badly than many
others have!

What I am trying to say is that it's not really necessary to worry
about hardware at this point. Good Lisp systems on stock hardware are
not that much slower than the fastest language implementations that
currently run on that same hardware. In `fair tests' they might be as
bad as 1/2 the speed. However fair tests are hard to come by because
the fastest language implementations are typically sufficiently
low-level that for large programs they are crippled by reinventing
half of Lisp, badly, and thus don't perform anywhere near their
theoretical maximum.

So if you want a neo-Lispm I think you should concentrate exclusively
on software and, with few exceptions, software at a fairly high level
(don't try and write a window system substrate, use X / Windows for
instance, don't try and write CL-from-zero, start an existing
implementation & so on). There are a couple of low-level places that
are still interesting -- threading / processes for one, and
64-bittiness to the extent that big VM is a win (but 64bit machines
and Lisps will turn up fast enough).

(None of this should be taken to imply that *I* want a neo-Lispm!).

--tim

Tim Josling

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

In my experience with hardware and software vendors, if benchmark
information is not available, then the numbers can be assumed to
be bad.

I do not see Intel shouting IA64 benchmark results from the
rooftops. I have read the documentation, which you can get from
their web site, and my conclusion is:

- it will be hard to do good code generation for this chip
- I suspect this may be another Intel disaster, in the fine
tradition of the 286, the x87 and the 432

AMD has a pretty good critique of ia64 on their web site. Their
argument is that the only part of the ia32 architecture that is
actually a constraint on performance is the floating point
register stack.

Generating lisp code for ia64 should be no worse than for any
other language. How do you do efficient dynamic typing; how to do
fast garbage collection; how to deal with the possibility of
redefining functions at any time?

Tim Josling

Andy Freeman wrote:
...

> The IA-64 ISA does seem to demand a complex compiler merely
> to generate reasonable code. I don't understand it well enough
> to know when that complexity generates significant performance

> gains, ie, good code...

Christopher Browne

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

Centuries ago, Nostradamus foresaw a time when Tim Bradshaw would say:

>* Christopher Browne wrote:
>> a) Deploy newer "blindingly fast" Lisp hardware, or
>> b) Rewrite the code for newer hardware.
>
>I think this slightly misses my point -- though less badly than many
>others have!

Less than you might think. The point of diverging from your point ;-]
is to suggest that if one is going to the effort of finding
replacements for the probably-starting-to-fall-apart older Lisp
Machines (after all, hard drives don't last forever!), it's a pretty
big effort that mandates looking for creative solutions.

>What I am trying to say is that it's not really necessary to worry
>about hardware at this point. Good Lisp systems on stock hardware are
>not that much slower than the fastest language implementations that
>currently run on that same hardware. In `fair tests' they might be as
>bad as 1/2 the speed. However fair tests are hard to come by because
>the fastest language implementations are typically sufficiently
>low-level that for large programs they are crippled by reinventing
>half of Lisp, badly, and thus don't perform anywhere near their
>theoretical maximum.

That's pretty fair.

>So if you want a neo-Lispm I think you should concentrate exclusively
>on software and, with few exceptions, software at a fairly high level
>(don't try and write a window system substrate, use X / Windows for
>instance, don't try and write CL-from-zero, start an existing
>implementation & so on). There are a couple of low-level places that
>are still interesting -- threading / processes for one, and
>64-bittiness to the extent that big VM is a win (but 64bit machines
>and Lisps will turn up fast enough).
>
>(None of this should be taken to imply that *I* want a neo-Lispm!).

There is probably considerable merit to working on "high level" tools
for things like graphics that would not need to change *regardless* of
what might go on with the hardware underneath.

For instance, you might use X as a substrate, and then build a layer
on top of that (not unlike the way the GTK people use something called
GDK, and Tk has a similar layer between it and [X|Win32|...]) so that
it can be at least somewhat "substrate-agnostic."

On the other hand, right now is *not* a great time to do vast
quantities of assembly language coding, because there's so much "CPU
flux."

--
"People are more vocally opposed to fur than leather because it's
easier to harass rich women than motorcycle gangs." [bumper sticker]
cbbr...@ntlug.org- <http://www.ntlug.org/~cbbrowne/lsf.html>

Tim Bradshaw

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

* Tim Josling wrote:

> - it will be hard to do good code generation for this chip
> - I suspect this may be another Intel disaster, in the fine
> tradition of the 286, the x87 and the 432

I think the theory is that the Merced *implementation* is bad, but
Intel are betting that the next implementation of the architecture
(McKinley?) will be fast. Even if IA-64 is bad, Intel do have a
certain amount of experience of getting quite competitive performance
from bad instruction set designs... I've heard this theory from at
least 2 sources in the last few months.

Certainly if they can't either make some IA-64 implementation
competitive or rapidly come up with another 64bit architecture they're
in pretty serious trouble, since the other workstation-class and above
processor makers have all (?) been shipping 64bit systems for quite
some time.

Robert Monfera

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

Tim Bradshaw wrote:

> Certainly if they can't either make some IA-64 implementation
> competitive or rapidly come up with another 64bit architecture they're
> in pretty serious trouble, since the other workstation-class and above
> processor makers have all (?) been shipping 64bit systems for quite
> some time.

Isn't it the very argument that Intel may continue to get away with it
in the future also? We'd all use Aplhas or Sparcs if the number of bits
and performance would have been more important in the past than
compatibility.

Robert

Tim Bradshaw

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

* Robert Monfera wrote:

> Isn't it the very argument that Intel may continue to get away with it
> in the future also? We'd all use Aplhas or Sparcs if the number of bits
> and performance would have been more important in the past than
> compatibility.

I don't think that the performance issue is a very big deal --
Pentiums (Pentia?) are pretty competitive. I guess they're slower
than Alphas but perhaps faster than SPARC which has not been a great
performer for a long time.

Much more important I think is address space starvation. 4Gb is not a
lot for big machines already, and there is a lot of profit in big
machines. I guess 4Gb is fine for PCs for a few years (how many I
wonder, perhaps no more than 4?), but the cost of the processor is a
big issue for PCs, and Intel have some fairly aggressive competition
for IA-32 there (AMD &co, and I guess Transmeta now).

--tim

Erik Naggum

unread,

Jan 26, 2000, 3:00:00 AM1/26/00

to

* Barry Margolin <bar...@bbnplanet.com>

| I'm not familiar with the Pentium III architecture (I haven't really

| studied an instruction set since the Z80). Could you give some examples
| of the features you're talking about?

I'd like to, but that needs delving back into the documentation and doing
quite a bit of research. I have not found time for it in the past few
days, and a three-week break is coming up fast, so I have to disappoint
you with long response times on this one.

| Are these things like the MMX instructions (those were in the Pentium II,
| as well, weren't they)?

the simplest idea is to use published knowledge of the core architecture
to schedule register updates, memory transactions, etc, vastly more
efficiently. several new instructions have also been added to make life
a lot easier for particular tasks -- but actually being able to use them
well requires significant effort on both the part of the programmers and
the compiler writers.

| I know it has lots of bit-twiddling instructions that are supposed to be

| useful for graphics, and I think some DSP-related instructions, but
| neither of these seem like they would be of much assistance in Lisp.

once I understood what they were doing, they were useful mathematical
functions and transformations that required you to shape your problem in
a particular way to be able to use. some of these "contortions" would
not be something a Lisp compiler would ordinarily engage in at all.
however, I wanted to be able to use these instructions and looked at how
I would have to go about that, but in the end, I decided against adding
instructions to the compiler.

| How do other applications make use of these features?

inlined assembly, or even whole functions in assembly language. there
are few compilers out there that can make full use of these things, but
the instruction scheduling isn't terribly hard to codify. the best part
is that when it's done properly, it doesn't affect performance on the
older processors. the new instructions are sometimes expanded as macros
by the better assemblers into instruction sequences for older processors.

| Isn't there some way to use the instructions and trap into macrocode when
| running on an older processor?

there are a whole bunch of CPU flags which tell you whether these new
features are available or not, so you would typically prepare a fat
binary that contained several versions of the same function according to
processor to make optimal use of these features. many games do this.

#:Erik

Christopher R. Barry

unread,

Jan 27, 2000, 3:00:00 AM1/27/00

to

Tim Bradshaw <t...@cley.com> writes:

> * Christopher Browne wrote:
>
> > a) Deploy newer "blindingly fast" Lisp hardware, or
> > b) Rewrite the code for newer hardware.
>
> I think this slightly misses my point -- though less badly than many
> others have!
>

> What I am trying to say is that it's not really necessary to worry
> about hardware at this point. Good Lisp systems on stock hardware are
> not that much slower than the fastest language implementations that
> currently run on that same hardware.

This whole thread (which I am now catching up on, having finally done
most of my homework...) has so far only debated the performance
aspects of Lisp vs. non-Lisp hardware. Why do we not have locatives in
Allegro CL or Lispworks though? Or a debugger that works as well as
the Symbolics one which (in my experience) _always_ and _reliably_
gives you all information with names about everything at any level on
the stack and from breakpoints in your program lets you evaluate forms
interactively in the environment of the breakpoint as well as modify
this environment before continuing.

Theoretically I suppose the Franz and Harlequin people could make code
compiled with (DEBUG 3) 5 times slower and larger and add all this
Symbolics functionality to the debugger. But the important point I am
making is that they of course haven't, and they possilby would if
there was perhaps a reasonably efficient and not too painful way to do
it.

The other important point is that Lisp hardware makes compiler writing
and low-level Lisp hacking easier. It's more fun to look at the output
of DISASSEMBLE on a Lisp machine than on an Intel processor, though
ultimately the code may not run faster than it would on an Intel....

Christopher

[Who today handed in a programming assignment not due until Feb 16.
because he got to use Lisp and bang the thing out in less than a days'
time while all his classmates get to suffer in C++ hell. And I banged
the thing out despite not having programmed for some time and wasting
a lot of time suffering a minor bout of temporary stupidity about
arrays and fill-pointers....]

Robert Monfera

unread,

Jan 27, 2000, 3:00:00 AM1/27/00

to

"Christopher R. Barry" wrote:

> Or a debugger that works as well as
> the Symbolics one which (in my experience) _always_ and _reliably_
> gives you all information with names about everything at any level on
> the stack and from breakpoints in your program lets you evaluate forms
> interactively in the environment of the breakpoint as well as modify
> this environment before continuing.

If you don't have a particular need to debug compiled code, you may try
an interpreted version. Regarding the names and values of all variables
in the (compiled) lexical environment, it depends on whether the
variable is visible at the particular point. For example, variables
bound by LET may be thrown away after their last use, before the
physical end of the lexical environment.

> Theoretically I suppose the Franz and Harlequin people could make code
> compiled with (DEBUG 3) 5 times slower and larger and add all this
> Symbolics functionality to the debugger. But the important point I am
> making is that they of course haven't, and they possilby would if
> there was perhaps a reasonably efficient and not too painful way to do
> it.

It must require a lot more resources and risk-tolerance to reimplement
CL on a specialised chip than to bring current implementations to the
Symbolics level of usability if there is such a gap.

Robert

Tim Bradshaw

unread,

Jan 27, 2000, 3:00:00 AM1/27/00

to

* Christopher R Barry wrote:

> This whole thread (which I am now catching up on, having finally done
> most of my homework...) has so far only debated the performance
> aspects of Lisp vs. non-Lisp hardware. Why do we not have locatives in

> Allegro CL or Lispworks though? Or a debugger that works as well as

> the Symbolics one which (in my experience) _always_ and _reliably_
> gives you all information with names about everything at any level on
> the stack and from breakpoints in your program lets you evaluate forms
> interactively in the environment of the breakpoint as well as modify
> this environment before continuing.

I think the locatives issue is reasonably good (but you can actually
live without them for most purposes, and cheap (non-consing) locatives
probably almost prevent an implementation on stock hardware and thus
should (IMO) never be in CL-the-standard.

I think the answers to the other questions basically are lack of
resource and lack of customer demand. Duane Rettig gave a
presentation at last year's LUGM which described some support for some
very cool-looking debugging stuff (I forget the details I'm afraid,
Duane probably is reading this and has them ...) which looked to me
like it could really do a lot interesting things. I'm reasonably
confident that if you offered to fund one of the vendors to produce
really cool debugging tools they would come up with the goods.
Remember how much money was spent at Symbolics...

Another point to bear in mind is that an optimising compiler often
compiles away all sorts of things that you think are there. The
symbolics one tended not to but then it didn't do very much
optimisation.

(Incidentally, the symbolics debugger has some pretty buggy areas,
particularly the looking-at-source stuff just doesn't work in many
cases).

--tim

Christopher R. Barry

unread,

Jan 27, 2000, 3:00:00 AM1/27/00

to

Tim Bradshaw <t...@cley.com> writes:

> Another point to bear in mind is that an optimising compiler often
> compiles away all sorts of things that you think are there.

But (SPEED 0) (DEBUG 3) or appropriate should make that a non-issue.

> The symbolics one tended not to but then it didn't do very much
> optimisation.
>
> (Incidentally, the symbolics debugger has some pretty buggy areas,
> particularly the looking-at-source stuff just doesn't work in many
> cases).

I found it always worked as long as you remembered to compile with
"source locators" toggled on. Most of the system sources I guess were
compiled with them off, since they are supposed to bloat everything
and make it slower. In Zmacs you compile a form with source locators
temporarily toggled by doing c-m-sh-C instead of just c-sh-C (IIRC).

Christopher

Christopher R. Barry

unread,

Jan 28, 2000, 3:00:00 AM1/28/00

to

Robert Monfera <mon...@fisec.com> writes:

> Regarding the names and values of all variables in the (compiled)
> lexical environment, it depends on whether the variable is visible
> at the particular point.

With (DEBUG 3) and low speed, visibility should correspond to what it
conceptually is to a human.

> For example, variables bound by LET may be thrown away after their
> last use, before the physical end of the lexical environment.

As they should be by the compiler; but when (DEBUG 3) and low speed is
set, and I set a breakpoint within the live range of the lexical
variables, even if it occurs after their last use and the compiler is
free to clobber them, I want to see their names and all their info.

Christopher

Robert Monfera

unread,

Jan 28, 2000, 3:00:00 AM1/28/00

to

"Christopher R. Barry" wrote:

> With (DEBUG 3) and low speed, visibility should correspond to what it
> conceptually is to a human.

I agree that this should be the very purpose of (DEBUG 3). What I am
asking is practical, not conceptual: is there something (variable names,
ability to change them) that you don't get when you debug an
_interpreted_ function?

Robert

Christopher R. Barry

unread,

Jan 29, 2000, 3:00:00 AM1/29/00

to

Robert Monfera <mon...@fisec.com> writes:

So far that seems to work okay. Unfortunately, if you are working with
a large program with many components that aren't near fully debugged,
running all of them interpreted is just _too_ slow (like hundreds of
times instead of maybe 1.5-5 times or whatever for (DEBUG 3)).

Christopher

Tim Bradshaw

unread,

Jan 29, 2000, 3:00:00 AM1/29/00

to

* Christopher R Barry wrote:

> So far that seems to work okay. Unfortunately, if you are working with
> a large program with many components that aren't near fully debugged,
> running all of them interpreted is just _too_ slow (like hundreds of
> times instead of maybe 1.5-5 times or whatever for (DEBUG 3)).

Why would you do that? Run it all compiled, wait till it breaks and
then rerun the bit that broke with that functuion interpreted.

--tim

Christopher R. Barry

unread,

Jan 29, 2000, 3:00:00 AM1/29/00

to

Tim Bradshaw <t...@cley.com> writes:

Because it might be a "deep" bug. One of those ones that only shows up
once every two weeks or months or something when in rare circumstances
the 5+ different conditions for the bug are met and when you get into
the debugger you are going to have to work with whatever you've got on
the stack.

Genera is pretty buggy I think, but it's really nice how you go to the
debugger with loads of information instead of having things just
crash/core-dump/whatever.

Note that I am _very_ _very_ pleased with the Allegro CL debugger at
least. I've spent much time with it and it is an excellent tool that
while not perfect (neither is the Symbolics, nor could any debugger be
"perfect") one can be very productive with.

Christopher

Robert Monfera

unread,

Jan 29, 2000, 3:00:00 AM1/29/00

to

"Christopher R. Barry" wrote:

>
> Tim Bradshaw <t...@cley.com> writes:
>
> > Why would you do that? Run it all compiled, wait till it breaks and
> > then rerun the bit that broke with that functuion interpreted.
>
> Because it might be a "deep" bug. One of those ones that only shows up
> once every two weeks or months or something when in rare circumstances
> the 5+ different conditions for the bug are met and when you get into
> the debugger you are going to have to work with whatever you've got on
> the stack.

This scenario assumes that you would be running your application with
(debug 3) possibly for months. If you consider the overhead of
maintaining symbol information and avoiding the very aggressive
optimizations compilers do, chances are the factor of slowdown would
probably be closer to ~100 than to 5. This is my guess only, and I
don't know how much faster a "genuine" (debug 3) would be compared to
interpreted code.

Also, I found that these tyoes of elusive errors come up when I assure
the compiler that something would be of a certain type, and I don't keep
the promise - these could be caught with a higher safety and lower speed
level.

Robert

Tim Bradshaw

unread,

Jan 29, 2000, 3:00:00 AM1/29/00

to

* Christopher R Barry wrote:

> Because it might be a "deep" bug. One of those ones that only shows up
> once every two weeks or months or something when in rare circumstances
> the 5+ different conditions for the bug are met and when you get into
> the debugger you are going to have to work with whatever you've got on
> the stack.

But I think running with a (DEBUG 3) that does what you want is going
to cause you really serious slowdown in any case. Optimizing
compilers optimize for a reason...

--tim

Jeff Dalton

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

I've found that I can run with fairly high optimization settings and
still have enough "debug-ability" for most purposes. I laugh at
nearby C++ programmers who seem to need two versions, one with -g
and one without. I sometimes have to resort to running something
interpreted, but not very often.

Someone mentioned complex or deep bugs where you're glad you have lots
of debugging info there all the time, rather than have to run again
and hope the same problem occurs. But I find that in difficult
debugging cases I usually have to rerun anyway, because some the
information I need is no longer on the stack in any case.

Another useful technique is to build some debugging tools of your own,
something that is fairly easy in Lisp. For instance, when I had some
process-like things that sent "messages" to each other, I wrote some
things to let me monitor messages of specified types or, more
generally, let me specify an arbitrary function that said which
messages were "interesting".

-- j

Kelly Edward Murray

unread,

Feb 7, 2000, 3:00:00 AM2/7/00

to

I took a look at this chip as a lisp processor,
out of curiousity. Its been my assesment
for some time that CPU cycles are mostly "free",
such that a 10-times instruction-level inefficiency isn't much of a
performance hit today,
and running natively compiled
run-time typed lisp-like language, you still run circles around these
other languages.
Only C/C++/Fortran is faster at raw CPU efficiency,
and who cares.

The performance bottleneck of today's (web) apps
is database access/update. And it's here where
there is opportunity to show massive performance
advance and scalability over traditional
languages and systems by using a native dynamic persistent object
system. This doesn't need
any special hardware. It needs something else.

-Kelly Murray k...@niclos.com http://www.niclos.com

Robert Monfera

unread,

Feb 7, 2000, 3:00:00 AM2/7/00

to

Kelly Edward Murray wrote:

> The performance bottleneck of today's (web) apps
> is database access/update.

Given the ~$2k/GB price of memory today, this is questionable. If
database performance is the bottleneck, it's best to store the entire
database in the memory and use a memory-optimized representation of
data. For example, I am working on a representation that stores class
instances without the overhead of type information, making it even more
compact and cache-aware. There is no way I can compare its performance
with a disk-based database, even if there's enough memory for the
disk-based one to fit in the memory.

The number one cost in a web server project is manpower, and usually a
lot of time is spent on database optimizing and schema denormalization.
Memory cost is lower and more predictable. In my experience, a 10GB
disk-based dataset fits nicely in 1GB of memory, because of the much
lower need for denormalization and indices, and better taylorability of
atomic data representation.

Robert

Tim Bradshaw

unread,

Feb 8, 2000, 3:00:00 AM2/8/00

to

* Kelly Edward Murray wrote:
> I took a look at this chip as a lisp processor,
> out of curiousity. Its been my assesment
> for some time that CPU cycles are mostly "free",
> such that a 10-times instruction-level inefficiency isn't much of a
> performance hit today,
> and running natively compiled
> run-time typed lisp-like language, you still run circles around these
> other languages.
> Only C/C++/Fortran is faster at raw CPU efficiency,
> and who cares.

I've heard exactly that claim from people who do serious (commercial)
computationally-intensive stuff -- basically cycles are now free, it's
cache misses that cost you, for `cache' being one of registers, cache,
and memory of various flavours. I guess for web / network apps you'd
also want to add `disk' to this -- going over the network is pretty
bad...

--tim

Espen Vestre

unread,

Feb 8, 2000, 3:00:00 AM2/8/00

to

Robert Monfera <mon...@fisec.com> writes:

> Given the ~$2k/GB price of memory today, this is questionable. If
> database performance is the bottleneck, it's best to store the entire
> database in the memory and use a memory-optimized representation of
> data.

For some databases, you can't get all of it into memory (unless
you can afford hundreds terabytes of RAM and have hardware, OS
and a runtime system that will support it).

And for those where you can, there are at least two problems:

- slow response before you get enough cache hits
- garbage collection in *huge* lisp programs is a tricky subject,
at least if you demand 24x7 operability.

--
(espen)

Robert Monfera

unread,

Feb 8, 2000, 3:00:00 AM2/8/00

to

Espen Vestre wrote:

> For some databases, you can't get all of it into memory (unless
> you can afford hundreds terabytes of RAM and have hardware, OS
> and a runtime system that will support it).

Yes, this is true, but the context was web servers and OODB as the
alternative. I am yet to hear of an OODB that handles that amount of
data, but I think that most intranet or internet web server images would
fit in a few GBs - and for even larger projects the memory costs may be
insignificant compared to the combined effort of the project.

> And for those where you can, there are at least two problems:
>
> - slow response before you get enough cache hits

If everything is in the physical memory, how could it not beat
disk-based access? Maybe you think of the necessary initial upload,
which is done when you launch the image, rather than at the first
access, or that the image size would exceed the size of the physical
memory.

> - garbage collection in *huge* lisp programs is a tricky subject,
> at least if you demand 24x7 operability.

When there is a massive amount of data, we already avoid GC - for
example, elements of fully declared or fixnum arrays are not GC'd
individually, and the array itself will become old soon - even if you
don't use ACL's :allocation :old option. I also find that arrays
created for temporary purposes are fast to create and GC - maybe it does
not even have to be copied before it's freed, given enough workspace and
general avoidance of large-scale consing.

I think that Lisp's GC is an asset when one demands high uptime, maybe
Erik or somebody else who has done it has some caveats.

Robert

Espen Vestre

unread,

Feb 8, 2000, 3:00:00 AM2/8/00

to

Robert Monfera <mon...@fisec.com> writes:

> Yes, this is true, but the context was web servers and OODB as the
> alternative.

Ok, my current context is a server that caches parts of a ~500GB
relational database.

> If everything is in the physical memory, how could it not beat
> disk-based access? Maybe you think of the necessary initial upload,
> which is done when you launch the image, rather than at the first
> access, or that the image size would exceed the size of the physical
> memory.

No, as I already said, I was thinking in terms of an application
that uses a relational database as a backend.

> When there is a massive amount of data, we already avoid GC - for

I'm not quite sure what you mean. Have you turned off global GC?
Maybe I'm missing something which could be essential to my application ;-)
--
(espen)

Espen Vestre

unread,

Feb 8, 2000, 3:00:00 AM2/8/00

to

Espen Vestre <espen@*do-not-spam-me*.vestre.net> writes:

> > disk-based access? Maybe you think of the necessary initial upload,
> > which is done when you launch the image, rather than at the first
> > access, or that the image size would exceed the size of the physical
> > memory.
>
> No, as I already said, I was thinking in terms of an application
> that uses a relational database as a backend.

let me elaborate on that: I completely agree that memory-only solutions
are very interesting for rather static databases, but I'm working
with databases that are highly dynamic (*lots* of inserts and updates).
If you try to do without any RDB or OODB as a persistent store backend
for such a database, wouldn't that mean that you'd have to reinvent
a whole lot of old database wheels? After all, e.g. Oracle is pretty
clever at caching (we run Oracle servers which use caches of more
than 500MB).
--
(espen)

Christopher Browne

unread,

Feb 9, 2000, 3:00:00 AM2/9/00

to

Centuries ago, Nostradamus foresaw a time when Espen Vestre would say:

Can I take a contrarian position?

Take a look at new engines like TimesTen <http://www.timesten.com/>
and Cach\'e <http://www.intersys.com/>. These are DBMS systems that
eschew paging.

A possibly-similar approach is taken by FastDB
<http://www.ispras.ru/~knizhnik/fastdb.html>, which combines in-memory
with a transactional scheme that pushes updates to a transaction log
on disk immediately just as is the case with traditional DBMSes.

If you head back to System R, the original RDBMS, and successors
moving through to the big name RDBMSes like Oracle/Informix/DB2, the
common thread is that they all do something analagous to demand
paging. This happens for much the same reason that UNIXes and POSIXes
almost all (QNX as a visible exception) do demand paging, namely that
at the time they were developed, you couldn't possibly have enough
memory to hold the whole database in memory.

The natural evolution that comes from that beginning is that the
approach to DBMS implementation is to start by creating a
demand-paging system under the assumption that you *don't* have enough
RAM to hold the database in memory.

It then makes sense to do heavy-duty caching to minimize the negative
impact of this.

Cach\'e, TimesTen, and FastDB make the contrary assumption that the
whole DB *can* be stored in RAM. You still make sure that updates get
pushed out to disk immediately to keep things robust.

I'll suggest the thought that you may have things backwards; in a
relatively static database, there's likely to be some locality of
reference that may mean that paging turns out to be cheap. If it's
really dynamic, being able to find all your data in RAM is going to
improve performance over the having to do a lot of paging to get at
the data.

As far as keeping things robust goes, there ought to be *no*
difference in the cost of writing transaction logs out to disk in
either situation, as a robust transaction log will require similar
operations either way.
--
"I worry that the person who thought up Muzak may be thinking up
something else." -- Lily Tomlin
cbbr...@ntlug.org - - <http://www.hex.net/~cbbrowne/nonrdbms.html>

Robert Monfera

unread,

Feb 9, 2000, 3:00:00 AM2/9/00

to

Espen Vestre wrote:

> Ok, my current context is a server that caches parts of a ~500GB
> relational database.

Can you estimate what would be the database size if you could do away
with most indices, lock tables, redundancy and denormalization? If you
would not have to use long time stamps and character-based keys but
instead integers? If you stored the parts that are actually used? If
you could split tables so that you don't have a lot of NULLs? Of course
I'm not implying that the result would fit in the memory or that a
RAM-based solution would be the right one for you, but in my experience
few multiple-gig databases store data that worth nearly that much.

> > When there is a massive amount of data, we already avoid GC - for
>
> I'm not quite sure what you mean. Have you turned off global GC?
> Maybe I'm missing something which could be essential to my application ;-)

Nothing special: simply allocating large, long-living vectors whose
values are immediate - no GC is done on those big chunks (up to
64MB/chunk). Oldspace will only be GC'd if newspace is full, which
should not happen often or at all. What are your experiences with this?

As for tuning, I allocate medium-short-lived (1-60 seconds) vectors
(temporary indices etc.) too, which are good candidates to be tuned -
not much need to do so so far.

You are right that in the case of RAM-based databases there is probably
still a need to reinvent (learn and implement) some common DBMS
techniques, like not actually deleting records (rows) from arrays, just
flagging them, and compacting arrays eventually.

Does Oracle do caching more intelligent than fetching in records that
have identical key-parts as specified for each table and dumping
oldest-accessed records?

Robert

Espen Vestre

unread,

Feb 9, 2000, 3:00:00 AM2/9/00

to

Robert Monfera <mon...@fisec.com> writes:

> 64MB/chunk). Oldspace will only be GC'd if newspace is full, which
> should not happen often or at all. What are your experiences with this?

Since I've just started to implement massive caching in my servers,
I'm still a newbie wrt. GC, but I simply have to turn automatic global
GC completely off, since the downtime that a global GC in a several-
hundred-megabytes lisp image would mean is not acceptable. Since it's
rather new, I'm still not sure what turning it off really implies, but
I could always let my internal 'cron' thread do it in some predefined
weekly service interval...

--
(espen)

ArolAmbler

unread,

Feb 21, 2000, 3:00:00 AM2/21/00

to

In article <ey3901k...@cley.com>, Tim Bradshaw <t...@cley.com> writes:

>* Joe Marshall wrote:
>
>> Current processors are `C machines'. A custom processor designed for
>> Lisp using current technology would significantly outperform a Lisp
>> implementation on stock hardware.
>
>I believe there are papers around written by Lisp/OO people which
>argue fairly strongly against this point of view. I'd be interested
>in any compelling arguments otherwise (especially from lisp compiler
>implementors).
>
Well, I was one of the principal implementors of Common Lisp for
Data General before Common Lisp was "official" (late 70's or
early '80s) on a "stock" monstrosity with 48 different instruction
"formats", and so many instructions they were hard to count.
(A kludge, on a kludge, on a kludge, of a clean simple machine).
Although there were still a significant number of optimizations yet to
added to the compiler, I studied several functions in detail at the
assembly language level, comparing lisp's output, with fortran's
output (DG had a very good fortran compiler) and with the best
I could think of writing directly in assembly language. The lisp
output was distinctly better than the fortran output, and only marginally
worse than the best I could think create by hand.
The "missing" optimizations centered around two things.
1. better utilization of typing information which the user didn't supply,
but that was implictly available from the functions invoked, or from
predicates used in conditions, or arms in typecase, and similar
places.
2. doing a complete transformation to the applicative domain,
optimizing there, and then transforming back to the normal imperative
domain. As, in the applicative domain, variables don't "vary", but are
only "initialized" and used, if at all, in a read-only way, it is very easy
to do dataflow optimizations, and to remove dead computations, in
a safe way. Once those things are done, the inverse transform back
to the imperative domain gets rid of all the tail recursion, and
turns it back into loops and variables.

If we had descended into writeable control store, we could, perhaps, have done
a few things a little bit better, but nowhere near twice as
well - more like ten percent or less. For SOME programs, such as
those with large bignums, we could have benefited more by having
microcode for the boole and logxxx family, as well as the arithmetic
and comparison functions, and the mask and test etc functions.

A vliw machine typically is NOT a good machine for anything. The
problem is that a vliw machine is just the same as a "normal-sized"
instruction machine, with an added restriction that the instructions
must be gulped in certain patterns (four at a time, or 2 of kind one,
and 1 of kind 2, or many other variants). The HOPE is that the
compiler will be able to use the other parts of the instruction word.
The fact is you have just restricted the compiler's freedom, as well
as any assembly language programmers. Unless the "benefit" of
wide gulps and simlified memory caching logic can be made to
exceed the loss from the restricted freedom, you have a net loss.
But there is little benefit, as instuctions can still be "gulped" with
a wide bus, and "shifted" into place inside the chip with only 1 gate
level. No way will that ever approach the loss caused by forced
idling of alu units because of vliw format restrictions.

Note: I said typically. The IA-64 may have found a way around the
typical problem - but lacking benchmark numbers, I doubt it.

ArolAmbler

unread,

Feb 21, 2000, 3:00:00 AM2/21/00

to

In article <388E2FE0...@melbpc.org.au>, Tim Josling <t...@melbpc.org.au>
writes:

>From: Tim Josling <t...@melbpc.org.au>
>Date: Wed, 26 Jan 2000 10:21:04 +1100
>
>In my experience with hardware and software vendors, if benchmark
>information is not available, then the numbers can be assumed to
>be bad.
>

insert "USUALLY" before assumed. The other reason benchmarks
aren't reported is politics, when there is a "standards group"
controlling things, and all the competitors vote on what the rules
are. So then, somebody finds a loophole in the testing rules,
publishes an "unfair" number, the rules are changed, and much later
somebody invents a new technique - that is "technically" in violation
of the wording of the rule designed to stop a certain kind of cheating.
This is so ESPECIALLY if the rule broken makes the entire product
a great deal better for the customers, as that is bad news for the
majority of the standards group.
Bingo - the world can't hear about the great results. Politics.

But, this is unlikely to be the case with the IA-64.

Joe Marshall

unread,

Feb 21, 2000, 3:00:00 AM2/21/00

to

ArolAmbler <arola...@aol.com> wrote in message
news:20000221080321...@nso-fz.aol.com...

>
> A vliw machine typically is NOT a good machine for anything. The
> problem is that a vliw machine is just the same as a "normal-sized"
> instruction machine, with an added restriction that the instructions
> must be gulped in certain patterns (four at a time, or 2 of kind one,
> and 1 of kind 2, or many other variants). The HOPE is that the
> compiler will be able to use the other parts of the instruction word.
> The fact is you have just restricted the compiler's freedom, as well
> as any assembly language programmers. Unless the "benefit" of
> wide gulps and simlified memory caching logic can be made to
> exceed the loss from the restricted freedom, you have a net loss.
> But there is little benefit, as instuctions can still be "gulped" with
> a wide bus, and "shifted" into place inside the chip with only 1 gate
> level. No way will that ever approach the loss caused by forced
> idling of alu units because of vliw format restrictions.

I disagree somewhat, but it isn't the width of the instruction thats
important, it is the assumptions that go along with it.
I suppose you could make a machine that is VLIW internally,
and perhaps `partial' VLIW externally. The idea being that the unused
parts of the instruction word are given appropriate defaults if they
are actually missing from the instruction stream.

A multi-byte instruction stream often encodes way too much
information. Thus you have a much more complicated instruction
decode time, and the logic of when during the instruction cycle you
processes exceptions becomes really complicated.

Ultimately, I think the answer is `it depends', and I don't think it depends
on the width of the instruction word as much as it depends on other
things.

~jrm

Kelly Edward Murray

unread,

Feb 21, 2000, 3:00:00 AM2/21/00

to

If you think about it, there really isn't a whole
lot you can do to speed things up in the CPU after you can
execute most all instructions in one cycle,
and then you've speed it up to 500mhz+.
You've got to go to parallel execution, of which VLIW
/seems/ more possible to take advantage of at the
compiler level. Reading the IA64 docs, writing
the compiler looks to be a venture worthy of
a dozen Phd's somewhere in there. Intel has too
many smart people who think they can do it.

I recall someone (perhaps at Encore) who suggested that they use VLIW
processors to execute entirely
different user programs in the different slots
of the machine. That would be an effective way
to gain speedup by essentially getting low-cost multiple-cpus on a
single chip.

I've always thought the 386 was a good Lisp arch,
certainly good enough. They are all good enough.
They only bugger was the OS/chips that don't let
you write to executable memory. Just gets in your way, doesn't help.

Just my ramblings..

-kelly murray k...@niclos.com