Fazing out x86

Hilal

unread,

Dec 5, 2007, 10:57:39 AM12/5/07

to

Hi,

I am a 4th year student CS. Me and my friend had once a discussion
about how we could in a native way get rid of x86 ISA. So I wrote a
small idea sketch and would like hear your thoughts about it. Sorry if
it is a bit long.

---------------------

<Abstract>:
Today everyone asks the question about what are we going to do with
quad or even octal cores? Maybe we can put in a GPU, sound card or
other cool things. But all this comes at the expense of the FSB being
stressed out, it will be the bottleneck. I want to propose another
view to this opportunity. Why not put in two cores that use two
different ISA?

<Problem>:
x86 have lived through a world where we are constantly trying to be
independent from technology, so that we can upgrade to a new
technology when it is present. We don't want to brake our 'legacy'
programs and still we want a new and cleaner ISA.

<Solution>:
Today we are totally capable of changing an ISA by utilizing the
opportunities we have with multi-cores. Not because we couldn't have
made the same thing with two processors on one motherboard, but this
time it is much cheaper. And not that we don't have solutions for
running different ISA's but this time it is natively done. My solution
is two cores each running it's own ISA without any overhead, which
means running two programs natively in different ISA.

With this solution we can still run the 'legacy' code with a 100 %
guarantee that any bug that was present on a pure x86 CPU is still
present on a multi-core CPU with two ISA!

<Research>:
What we need to make this work is a simple operating system that can
detect the different binaries and assigning them to the right core.
They will still run in the same memory, coexists on the same hard
drive. Theoretically this is done easily. The only problem is that we
need a real multi-core CPU that runs two different ISA simultaneously.
Creating a simulation for this simple and elegant solution would be an
overkill because anyone can see that this should work.

<Benefit for Intel>:
Intel can now give a reason to why we need octal core processors. Also
Intel has a once in a lifetime chance to create it's own ISA, that
would make life easier for compilers and the gaming industry would
love some extra registers. Intel would have a nice way of fazing out
x86 code, maybe in 10 years no new code would be written in x86 but
old code could still run as if nothing happened.

This is just the tip of the ice berg. Why not make a CPU that has 3-4
different ISA? Stack-based, Register-Register and maybe VLIW? And let
the compiler choose the best ISA for a specific application?

<Final thoughts>:
I believe this is a really simple solution and can't really understand
why no one hasn't thought about it? Or is there a real problem behind
it that I have missed? Is it something you have considered and
realized that there is not real benefit? I would really be happy if I
could hear your response about this.

thank you in advance,

Stephen Fuld

unread,

Dec 5, 2007, 11:20:48 AM12/5/07

to

Hilal wrote:
> Hi,
>
> I am a 4th year student CS. Me and my friend had once a discussion
> about how we could in a native way get rid of x86 ISA. So I wrote a
> small idea sketch and would like hear your thoughts about it. Sorry if
> it is a bit long.
>
> ---------------------
>
> <Abstract>:
> Today everyone asks the question about what are we going to do with
> quad or even octal cores? Maybe we can put in a GPU, sound card or
> other cool things. But all this comes at the expense of the FSB being
> stressed out, it will be the bottleneck. I want to propose another
> view to this opportunity. Why not put in two cores that use two
> different ISA?

snip

> <Research>:
> What we need to make this work is a simple operating system that can
> detect the different binaries and assigning them to the right core.

And for which ISA is this OS written? That is, it itself can't run on
both cores.

> thank you in advance,

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Hilal

unread,

Dec 5, 2007, 12:38:46 PM12/5/07

to

Hi,

Thank you for fast replay,

That is an interesting question, we can for one recompile the OS, or
if Microsoft is slow it can still run on the old ISA but we only
change how it assigns a core for an application. In this way we can
even make sure that only our interesting application is compiled for
the new ISA and have a 'dedicated' core, because OS and probably other
old application will take the old cores.

best regards,

Niels Jørgen Kruse

unread,

Dec 5, 2007, 1:24:23 PM12/5/07

to

Hilal <mont...@gmail.com> wrote:

> That is an interesting question, we can for one recompile the OS, or
> if Microsoft is slow it can still run on the old ISA but we only
> change how it assigns a core for an application. In this way we can
> even make sure that only our interesting application is compiled for
> the new ISA and have a 'dedicated' core, because OS and probably other
> old application will take the old cores.

Who wants to bury x86? Intel probably wanted to replace x86 with the
Itanium ISA, but that battle was lost and they have had enough attention
from monopoly overseeing authorities that they are probably not that
eager anymore.

They just need to prevent AMD from investing enough in future products
to ever turn the tables. As an Intel stock owner I just hope they can
ease off soon, so earnings can go up.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

Hilal

unread,

Dec 5, 2007, 3:30:43 PM12/5/07

to

Hi,

On Dec 5, 7:24 pm, nos...@ab-katrinedal.dk (Niels Jørgen Kruse) wrote:
> Who wants to bury x86? Intel probably wanted to replace x86 with the
> Itanium ISA, but that battle was lost and they have had enough attention
> from monopoly overseeing authorities that they are probably not that
> eager anymore.
>

As I understood Itanium runs VLIW ISA and tried to translate x86, that
is not a native solution, and maybe thats why it was slow not to
mention that there also where not many applications compiled to VLIW.
I am not very high on Itanium.

But here we have a native solution. Even if apps are not compiled for
this new ISA, they will ran as fast as on old cores, and in the mean
time your PC is "capable" of this new ISA and will get a performance
boost when new binaries comes out.

> Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

best regards,

Andrew Reilly

unread,

Dec 5, 2007, 4:31:23 PM12/5/07

to

On Wed, 05 Dec 2007 16:20:48 +0000, Stephen Fuld wrote:

>> What we need to make this work is a simple operating system that can
>> detect the different binaries and assigning them to the right core.
>
> And for which ISA is this OS written? That is, it itself can't run on
> both cores.

Not necessarily. I remember the first article I read about TAO-OS
featured it running on a custom hetrogeneous multiprocessor system (some
68k, some x86, some other processors). Of course TAO-OS is based on on-
load compilation from a RISC-like virtual machine instruction set, but
there's no obvious reason why your putative OS couldn't have some kind of
apple-style fat binary, already compiled to native code.

Back in the day, BBC micros sometimes ran with different-architecture co-
processors. I don't know how shared the OS environment was.

About twelve years ago I used a Sony NEWS3860 (?), which ran BSD Unix on
some arrangement of 68020 and MIPS processors. The MIPS core ran user
processes, and the 68020 did device IO, but I have some memory of reading
that the 68020 was actually running something very much like another copy
of BSD, with all of the device drivers in that section.

Also: many x86_64/AMD64 systems are already dealing with multiple
instruction sets, although not typically within the "OS kernel". A
protected-memory micro-kernel like QNX should be able to manage that
trick, though.

Not that there's an awful lot of point, of course.

--
Andrew

YANSWBVCG

unread,

Dec 5, 2007, 4:52:09 PM12/5/07

to

How about Knuth's MMIX?

Hilal

unread,

Dec 5, 2007, 5:30:16 PM12/5/07

to

On Dec 5, 10:31 pm, Andrew Reilly <andrew-newsp...@areilly.bpc-

users.org> wrote:
> Not necessarily. I remember the first article I read about TAO-OS
> featured it running on a custom hetrogeneous multiprocessor system (some
> 68k, some x86, some other processors). Of course TAO-OS is based on on-
> load compilation from a RISC-like virtual machine instruction set, but
> there's no obvious reason why your putative OS couldn't have some kind of
> apple-style fat binary, already compiled to native code.
>

Yes the apple-style is indeed the same solution, except that:
1- You only have 'the new' core left
2- You need to recompile all old, and there are a lot of programs
still not and probably will not be compiled to the new core. (a friend
of mine worked at a printing house, and they still use PowerPC code
for various reasons). And everyone remembers Rosetta ;)

> Also: many x86_64/AMD64 systems are already dealing with multiple
> instruction sets, although not typically within the "OS kernel". A
> protected-memory micro-kernel like QNX should be able to manage that
> trick, though.
>

There are other techniques to solve this problem:
*Binary translation
*Microcode translate
*Recompiling
*Emulation

and some others, all having pros and cons, but I think having two ISA
at the same time is the most secure and easy way to faze out x86.
Another pro is that your compile can optimize better code for what you
choose to recompile.

best regards,

John Dallman

unread,

Dec 5, 2007, 7:25:00 PM12/5/07

to

In article
<a656099b-74a8-4bc5...@e6g2000prf.googlegroups.com>,
mont...@gmail.com (Hilal) wrote:

> Today everyone asks the question about what are we going to do with
> quad or even octal cores?

Currently, many-cores is a fashion. There are things it is good for, and
there are things that it is not so good for. It's fashionable at present
for two reasons:

* Putting lots of instances of a CPU core on a die is vastly easier
than designing a single core that uses all those transistors to run
much faster at the same clock speed. And there's a problem with
increasing clock speeds at present.

* Lots of cores works really well for apps that can be done in
client-server form, with one server handling lots of clients. And that
happens to be fashionable at present, because it's the way the WWW
works. It is much less useful for making something that's limited by
CPU power go fast, because programming with lots of threads is hard,
and no replacement idiom has been found to be useful across lots of
types of application, as yet.

> Maybe we can put in a GPU, sound card or other cool things. But all
> this comes at the expense of the FSB being stressed out, it will be
> the bottleneck.

One answer to this to to have several memory controllers on the CPU, and
thus several independent memory busses. This is fairly straightforward,
in principle.

> I want to propose another view to this opportunity. Why not put in
> two cores that use two different ISA?

How does this help with FSB congestion?

> <Problem>:
> x86 have lived through a world where we are constantly trying to be
> independent from technology, so that we can upgrade to a new
> technology when it is present. We don't want to brake our 'legacy'
> programs and still we want a new and cleaner ISA.

Actually, x86 doesn't really seem to be the problem. It isn't nice, but
there isn't anything about it that makes processors run a lot slower
these days, now that the transistors to decode x86 into micro-ops have
become trivially achievable. What would your new ISA have as practical
advantages over the old one? Because cleanness and niceness does not do
it: see the fate of the DEC Alpha.

> Today we are totally capable of changing an ISA by utilizing the
> opportunities we have with multi-cores. Not because we couldn't have
> made the same thing with two processors on one motherboard, but this
> time it is much cheaper. And not that we don't have solutions for
> running different ISA's but this time it is natively done. My solution
> is two cores each running it's own ISA without any overhead, which
> means running two programs natively in different ISA.
>
> With this solution we can still run the 'legacy' code with a 100 %
> guarantee that any bug that was present on a pure x86 CPU is still
> present on a multi-core CPU with two ISA!

Yes. But you have only half - or thereabouts - the cores of any one
architecture that you would have if the cores all had the same
instruction set. This may seem admirable as a transition to a new
architecture. But before you can get such a transition to start, there
have to be good reasons for the new architecture to succeed.

Itanium has shown that merely having dominant industry players pushing
something really hard isn't enough. The retirement of SPARC from the CPU
power per core competition has shown that vast investment seems to be
necessary to build a CPU that runs fast enough to compete these days.
And 64-bit x86 has very good upwards compatibility indeed from 32-bit
x86, and yet many ISVs are avoiding it and sticking to 32-bit.

So as well as this idea, you seem to need a new ISA that is utterly
compelling: that is significantly cheaper to implement for very high
performance than current ISAs, that is at least as efficient in its use
of RAM, and makes it very easy to re-target x86 apps, drivers and OSes.

I'm not saying that this is actually impossible, but some expensive
attempts have failed fairly thoroughly. More than just the need for it
is required; some new basic thinking seems to be in order,

> <Benefit for Intel>:
> Intel can now give a reason to why we need octal core processors.

They have one. It's called "We're selling them, our marketing department
tells you they are wonderful, and nobody is offering an attractive
alternative". No, that isn't a technical good reason for building 8-core
chips rather than faster two-core ones. Technical good reasons aren't
all that important in the commercial CPU business.

> Also Intel has a once in a lifetime chance to create it's own ISA, that
> would make life easier for compilers and the gaming industry would
> love some extra registers.

They already had that. It was Itanium. Huge amounts of talent and effort
went into that, and it has failed fairly completely. The reasons seem to
lie in fairly simple failures to manage the project effectively in its
early stages.

> This is just the tip of the ice berg. Why not make a CPU that has 3-4
> different ISA? Stack-based, Register-Register and maybe VLIW? And let
> the compiler choose the best ISA for a specific application?

The instruction set makes much less difference these days than you
think. This is an easy mistake to make: ISAs are what computer science
students study. However, performance limits these days are almost
entirely set by the limitations of memory latency and bandwidth. Caches
are where it's really at.

--
John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

John Dallman

unread,

Dec 5, 2007, 7:25:00 PM12/5/07

to

In article <5rojl9F...@mid.individual.net>,
andrew-...@areilly.bpc-users.org (Andrew Reilly) wrote:

> Back in the day, BBC micros sometimes ran with different-architecture
> co- processors. I don't know how shared the OS environment was.

Not at all. No code was normally in both processor's address spaces.
Much of the OS on the co-processor side consisted of simply passing
arguments to the BBC micro side.

Ken Hagan

unread,

Dec 6, 2007, 4:24:26 AM12/6/07

to

On Wed, 05 Dec 2007 15:57:39 -0000, Hilal <mont...@gmail.com> wrote:

> I believe this is a really simple solution and can't really understand
> why no one hasn't thought about it? Or is there a real problem behind
> it that I have missed? Is it something you have considered and
> realized that there is not real benefit? I would really be happy if I
> could hear your response about this.

I suppose one could make the argument that this *has* been thought
about. It has been possible for a long time to build a system with
an x86 CPU and a different one. I believe Apple even did so, with
the x86 on some kind of add-in card.

The other sense in which "it has been done" is that all the x86 chips
found in PCs for the past 10-15 years have actually been RISC chips
that dynamically translated x86 into some irrelevant private ISA
that they then execute however they choose.

The essential problem with your "asymmetric multi-core" proposal
is that many of the processes that you run on it can't use half
the processing power that you've paid for. If you *are* willing to
take a 50% cut in performance, then a better idea is to JIT-compile
x86 code into the ISA of your choice and run everything on a more
conventional symmetric multi-core system.

But as I said, all recent x86 chips do that in hardware anyway.

Hilal

unread,

Dec 6, 2007, 5:06:37 AM12/6/07

to

On Dec 6, 1:25 am, j...@cix.co.uk (John Dallman) wrote:
> > I want to propose another view to this opportunity. Why not put in
> > two cores that use two different ISA?
>
> How does this help with FSB congestion?
>

It dose not, but it will avoid it, because probably the GPU will stay
where it is today with its very fast RAM instead of being inside and
using the relatively slow main memory.

> So as well as this idea, you seem to need a new ISA that is utterly
> compelling: that is significantly cheaper to implement for very high
> performance than current ISAs, that is at least as efficient in its use
> of RAM, and makes it very easy to re-target x86 apps, drivers and OSes.
>
> I'm not saying that this is actually impossible, but some expensive
> attempts have failed fairly thoroughly. More than just the need for it
> is required; some new basic thinking seems to be in order,
>

That's what we need, a simple ISA, like a simple RISC so then we can
have even more cores. It isn't a big problem two have 2 old x86 core
and 4-6 'small' faster cores on this new ISA.
Simplicity will give us advantage of really going parallel. And I see
this is easier to push than the Itanium way, here your software will
work on native old core, the fazing will be from under you. The
compilers will change fisrt, and maybe small change to OS. Then after
some years you only need one x86 core just to be backward compatible.

> > This is just the tip of the ice berg. Why not make a CPU that has 3-4
> > different ISA? Stack-based, Register-Register and maybe VLIW? And let
> > the compiler choose the best ISA for a specific application?
>
> The instruction set makes much less difference these days than you
> think. This is an easy mistake to make: ISAs are what computer science
> students study. However, performance limits these days are almost
> entirely set by the limitations of memory latency and bandwidth. Caches
> are where it's really at.
>

That's really interesting, because today it seems people are trying to
get rid of the cache for embedded systems. But when it comes to
general purpose computers I can't see a better way than having cache,
other than having built in memory in CPU ;) That's a bit off topic of
me.

> --
> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

But yeah, you raised a lot of interesting comments, and maybe this
idea is shadowed by the failure of Itanium too much? Also today it
seems like a lot of Universities are trying to have reconfigurable
computing. I mean what if one of the cores in a Quad CPU was a small
FPGA? Then if you can integrate software and hardware in a nice way
you will get a very fast specific application.

Thank you for your time.

already...@yahoo.com

unread,

Dec 6, 2007, 6:23:43 AM12/6/07

to

On Dec 6, 11:24 am, "Ken Hagan" <K.Ha...@thermoteknix.co.uk> wrote:

> On Wed, 05 Dec 2007 15:57:39 -0000, Hilal <montha...@gmail.com> wrote:
>
> The other sense in which "it has been done" is that all the x86 chips
> found in PCs for the past 10-15 years have actually been RISC chips
> that dynamically translated x86 into some irrelevant private ISA
> that they then execute however they choose.
>

Look for the posts of Mich Alsup in this very group. Once he nicely
explained why it is not true.

already...@yahoo.com

unread,

Dec 6, 2007, 6:53:35 AM12/6/07

to

On Dec 6, 12:06 pm, Hilal <montha...@gmail.com> wrote:
>
> But yeah, you raised a lot of interesting comments, and maybe this
> idea is shadowed by the failure of Itanium too much?
>

His main point has little to do with Itanium*. Look at Power6. That's
really astonishing CPU when it comes to throughput oriented server
tasks or throughput-oriented number crunching. But when you look for
single-threaded performance, esp. non-numeric, it is not any faster
than the fastest x86 processors. In fact, it's slightly slower than
Intel's new Penryn, that's despite Power6 enjoying 2x if not 3x per
core higher power budget. And single-thread non-numeric performance is
what's matter for majority those of PC/workstation users that at all
care about performance.

Bottom line - as a general-purpose ISA x86 is good enough. There is
nothing on the horizon of ISA research to make the decisive (like 3x)
difference in _general-purpose_ performance. Multiply that by the fact
that clean sheet is nice on paper but in practice the first iteration
tends to be clean shit. But I guess the later point is unlikely to be
recognized by either average student or average CS professor.
On the other hand, there is a serious justification for new
_specialized_ instruction sets and even for putting these new special-
purpose processors on the same die with main x86 CPU. But those are
going to complement x86 ISA rather than faze it out.

----
* I wouldn't call Itanium the failure. It's just not such a huge
success as many hype-lovers predicted 10 years ago.

Ken Hagan

unread,

Dec 6, 2007, 7:29:48 AM12/6/07

to

What isn't? If you're referring to my probable mis-use of the term RISC,
then I'm guilty as charged, but in my defence I don't think it invalidates
my main point, which is that most of the benefits of moving away from x86
were reaped a long time ago. It just isn't visible to the programmer.

already...@yahoo.com

unread,

Dec 6, 2007, 7:36:54 AM12/6/07

to

On Dec 6, 2:29 pm, "Ken Hagan" <K.Ha...@thermoteknix.co.uk> wrote:

Yes, I refered to the notion that the likes of K8, PM, Merom and to
some extend even P4 are internally seriously CISCy top to bottom.
Of course, I agree with your main point.

Anton Ertl

unread,

Dec 6, 2007, 9:05:01 AM12/6/07

to

Hilal <mont...@gmail.com> writes:
>On Dec 6, 1:25 am, j...@cix.co.uk (John Dallman) wrote:
>> > I want to propose another view to this opportunity. Why not put in
>> > two cores that use two different ISA?
>>
>> How does this help with FSB congestion?
>>
>
>It dose not, but it will avoid it, because probably the GPU will stay
>where it is today with its very fast RAM instead of being inside and
>using the relatively slow main memory.

If people want to incur the cost of a separate GPU, they can do that
without also incuring the cost of additional incompatible cores.

Concerning the high-bandwidth RAM, a possible solution would be to
have one part of the RAM (say, 1GB) in the form of high-bandwidth RAM
soldered on the motherboard, as is done now for GPUs, and the rest
expandable in slots, as is done now in CPUs. That approach can be
used to deal with both the bandwidth problem of many-core CPUs and of
integrated CPU/GPU combos. I guess there would be some swapping
mechanism between the high-bandwidth and the low-bandwidth RAM.

Alternatively, one would go away from expandable RAM completely, and
buy motherboards with particular amounts of soldered RAM. After
all, most people don't upgrade the RAM during the lifetime of the
computer anyway.

One question with these ideas is if socketing the CPU limits bandwidth
significantly, and if we will have to go with soldered CPUs as well as
soldered RAM eventually.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Spoon

unread,

Dec 6, 2007, 9:30:42 AM12/6/07

to

Hilal wrote:

> As I understood Itanium runs VLIW ISA and tried to translate x86, that
> is not a native solution, and maybe thats why it was slow not to
> mention that there also where not many applications compiled to VLIW.
> I am not very high on Itanium.

AFAIU, the original Itaniums provided an on-die x86 core. (AFAIR, this
x86 core had performance equivalent to that of a PPro200.) When Intel
made the Itanium2, they removed the x86 core, and wrote the IA-32
Execution Layer (binary translation AFAIU).

http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/itanium/66007.htm

I don't think it was possible to run x86 and IA-64 in parallel.
(The two "cores" may have shared some hardware?)

already...@yahoo.com

unread,

Dec 6, 2007, 10:24:12 AM12/6/07

to

On Dec 6, 4:30 pm, Spoon <root@localhost> wrote:
> Hilal wrote:
> > As I understood Itanium runs VLIW ISA and tried to translate x86, that
> > is not a native solution, and maybe thats why it was slow not to
> > mention that there also where not many applications compiled to VLIW.
> > I am not very high on Itanium.
>
> AFAIU, the original Itaniums provided an on-die x86 core. (AFAIR, this
> x86 core had performance equivalent to that of a PPro200.) When Intel
> made the Itanium2, they removed the x86 core, and wrote the IA-32
> Execution Layer (binary translation AFAIU).
>

> http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ita...
>

Actually, the first couple of generations of Itanium2 (McKinley and
Madison) still had x86 hardware. Yes, on wast majority of application
the hardware was significantly slow that IA-32 EL.
However only starting from Montecito (dual-core variant of Itanium2
released approximately 1.5 years ago) x86 hardware was dropped.

> I don't think it was possible to run x86 and IA-64 in parallel.
> (The two "cores" may have shared some hardware?)

Correct. not "some" but pretty much all execution units were shared.

Paul A. Clayton

unread,

Dec 6, 2007, 12:16:48 PM12/6/07

to

On Dec 5, 1:24 pm, nos...@ab-katrinedal.dk (Niels Jørgen Kruse) wrote:
[snip]

> Who wants to bury x86? Intel probably wanted to replace x86 with the
> Itanium ISA, but that battle was lost and they have had enough attention
> from monopoly overseeing authorities that they are probably not that
> eager anymore.

I suspect a significant number CS-types would heartily approve of the
death of x86, seeing the x86 ISA as an inelegant collection of ISA
extensions--unnecessarily difficult to decode, excessively limited in
its ability to express ILP (2 register operations, modest register
count), with support for unnecessarily complex operations.

Given that a modern, performance-oriented, general-purpose
processor implementation will have significant complexity in
implementation, it is not unreasonable for the ISA to be
relatively complex; but x86 wastes some of its complexity
budget. Engineering-oriented people would tend to be
uncomfortable about such waste.

Paul A. Clayton
just a technophile
reachable as 'paaronclayton'
at "embarqmail.com"

Benny Amorsen

unread,

Dec 6, 2007, 2:26:32 PM12/6/07

to

"Paul A. Clayton" <paaron...@earthlink.net> writes:

> I suspect a significant number CS-types would heartily approve of the
> death of x86, seeing the x86 ISA as an inelegant collection of ISA
> extensions--unnecessarily difficult to decode, excessively limited in
> its ability to express ILP (2 register operations, modest register
> count), with support for unnecessarily complex operations.

x86 code is dense. That's a large advantage for general-purpose code.
Are there any RISCs that are as dense, apart from ARM with Thumb?

/Benny

Stephen Fuld

unread,

Dec 6, 2007, 3:26:27 PM12/6/07

to

Anton Ertl wrote:
> Hilal <mont...@gmail.com> writes:
>> On Dec 6, 1:25 am, j...@cix.co.uk (John Dallman) wrote:
>>>> I want to propose another view to this opportunity. Why not put in
>>>> two cores that use two different ISA?
>>> How does this help with FSB congestion?
>>>
>> It dose not, but it will avoid it, because probably the GPU will stay
>> where it is today with its very fast RAM instead of being inside and
>> using the relatively slow main memory.
>
> If people want to incur the cost of a separate GPU, they can do that
> without also incuring the cost of additional incompatible cores.
>
> Concerning the high-bandwidth RAM, a possible solution would be to
> have one part of the RAM (say, 1GB) in the form of high-bandwidth RAM
> soldered on the motherboard, as is done now for GPUs, and the rest
> expandable in slots, as is done now in CPUs. That approach can be
> used to deal with both the bandwidth problem of many-core CPUs and of
> integrated CPU/GPU combos. I guess there would be some swapping
> mechanism between the high-bandwidth and the low-bandwidth RAM.

Isn't there a pin count issue? If you have a separate GPU chip,it has
lots of pins dedicated to the high bandwidth RAM interface. If you
integrate the GPU onto the CPU die, you lose the separate pins of the
separate GPU chip. Isn't that a bandwidth limiting problem irrespective
of what RAM chips you use?

Niels Jørgen Kruse

unread,

Dec 6, 2007, 5:05:03 PM12/6/07

to

Benny Amorsen <benny+...@amorsen.dk> wrote:

The difference is not all that. I checked the sizes of a few universal
binaries:
ppc i386 % ppc/i386
EyeTV 16239092 15764404 3.01
iTunes 14873344 14577664 2.03
MacSOUP 936764 838868 11.7
Camino 13597232 13674444 -0.565
Cyberduck 58320 59608 -2.16

Hilal

unread,

Dec 6, 2007, 5:09:52 PM12/6/07

to

Ken Hagan

>The essential problem with your "asymmetric multi-core" proposal
>is that many of the processes that you run on it can't use half
>the processing power that you've paid for. If you *are* willing to
>take a 50% cut in performance, then a better idea is to JIT-compile
>x86 code into the ISA of your choice and run everything on a more
>conventional symmetric multi-core system.

You can look at it in another way, you have 2 cores dedicated to your
apps that are recompiled, while every other thing that is not very
important are fighting on old core. Also as I mentioned earlier, when
we have simple ISA, we can have simpler core, many of them, why not 2
x86 core and 4 new core and BIGGER caches!?

I agree in the beginning it sounds a bit scary, but I believe it is
just a marketing thing, Intel used to sell MHz. Today they sell core,
tomorrow they sell ISAs !!!

>But as I said, all recent x86 chips do that in hardware anyway.

yeah I know, but at the expense of complexity.

Paul A. Clayton

>I suspect a significant number CS-types would heartily approve of the
>death of x86, seeing the x86 ISA as an inelegant collection of ISA
>extensions--unnecessarily difficult to decode, excessively limited in
>its ability to express ILP (2 register operations, modest register
>count), with support for unnecessarily complex operations.
>

>Given that a modern, performance-oriented, general-purpose
>processor implementation will have significant complexity in
>implementation, it is not unreasonable for the ISA to be
>relatively complex; but x86 wastes some of its complexity
>budget. Engineering-oriented people would tend to be
>uncomfortable about such waste.

you read my mind!
Wouldn't that even give us better compilers to optimize code? I have
no real ground for this, but my optimizing compiler teacher and
compiler construction teacher both seem to dislike x86 code.

Benny Amorsen

>x86 code is dense. That's a large advantage for general-purpose code.
>Are there any RISCs that are as dense, apart from ARM with Thumb?

in what way? Do you mean that you can do more with one instruction?
Then thats CISC right?
Isn't smaller increment faster to process and also will produce less
complex structures that then result in more space to keep data local
in the CPU? Why not have >100 register, and a big cache to support
fast context switching?

Stephen Fuld

>Isn't there a pin count issue? If you have a separate GPU chip,it has
>lots of pins dedicated to the high bandwidth RAM interface. If you
>integrate the GPU onto the CPU die, you lose the separate pins of the
>separate GPU chip. Isn't that a bandwidth limiting problem irrespective
>of what RAM chips you use?

interesting remark, =D

ken...@cix.compulink.co.uk

unread,

Dec 6, 2007, 5:11:20 PM12/6/07

to

In article <5rojl9F...@mid.individual.net>,
andrew-...@areilly.bpc-users.org (Andrew Reilly) wrote:

> Back in the day, BBC micros sometimes ran with different-architecture
> co- processors. I don't know how shared the OS environment was.

From memory the Z80 co-processor ran CPM on it's own memory (up to 64K)
with the rest of the machine being reduced to an I/O processor. You
could change the OS of the BBC by sticking in extension ROM. I seem to
remember 6502 and 6809 add on boards being available as well. Mind you I
never actually owned one. All the above is from what I remember finding
out when I was considering a replacement for my Video Genie.

Ken Young

Benny Amorsen

unread,

Dec 7, 2007, 4:34:33 AM12/7/07

to

nos...@ab-katrinedal.dk (Niels Jørgen Kruse) writes:

> The difference is not all that. I checked the sizes of a few universal
> binaries:
> ppc i386 % ppc/i386
> EyeTV 16239092 15764404 3.01
> iTunes 14873344 14577664 2.03
> MacSOUP 936764 838868 11.7
> Camino 13597232 13674444 -0.565
> Cyberduck 58320 59608 -2.16

Total size or text size?

/Benny

Niels Jørgen Kruse

unread,

Dec 7, 2007, 5:31:30 AM12/7/07

to

Benny Amorsen <benny+...@amorsen.dk> wrote:

It is the size as reported by lipo, eg.:

Mnementh:~ njk$ !497
lipo -detailed_info /Applications/iTunes.app/Contents/MacOS/iTunes
Fat header in: /Applications/iTunes.app/Contents/MacOS/iTunes
fat_magic 0xcafebabe
nfat_arch 2
architecture ppc
cputype CPU_TYPE_POWERPC
cpusubtype CPU_SUBTYPE_POWERPC_ALL
offset 4096
size 14873344
align 2^12 (4096)
architecture i386
cputype CPU_TYPE_I386
cpusubtype CPU_SUBTYPE_I386_ALL
offset 14880768
size 14577664
align 2^12 (4096)

Ken Hagan

unread,

Dec 7, 2007, 6:31:04 AM12/7/07

to

On Thu, 06 Dec 2007 22:09:52 -0000, Hilal <mont...@gmail.com> wrote:

> Wouldn't that even give us better compilers to optimize code? I have
> no real ground for this, but my optimizing compiler teacher and
> compiler construction teacher both seem to dislike x86 code.

For Itanium, Intel bet the farm on being able to create an ISA that
they could write a brilliant compiler for. They had a budget of
squillions, a completely free choice of ISA, some of the best
fabrication facilities on the planet, and no shortage of brains.

Perhaps x86 isn't as bad as the CS types say. It seems to be very
hard to do *much* better.

Alex Colvin

unread,

Dec 7, 2007, 8:36:19 AM12/7/07

to

How many processors are in a typical PC these days?

I'm including the main CPU(s), GPU, disk microcontollers. I think there's
still one in the keyboard. USB devices need one, but I don't know about
masters. The battery?

I assume only the CPU and GPU are programmable.

--
mac the naïf

Anton Ertl

unread,

Dec 7, 2007, 12:25:56 PM12/7/07

to

nos...@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?=) writes:

>Benny Amorsen <benny+...@amorsen.dk> wrote:
>> x86 code is dense. That's a large advantage for general-purpose code.
>> Are there any RISCs that are as dense, apart from ARM with Thumb?
>
>The difference is not all that. I checked the sizes of a few universal
>binaries:
> ppc i386 % ppc/i386
>EyeTV 16239092 15764404 3.01
>iTunes 14873344 14577664 2.03
>MacSOUP 936764 838868 11.7
>Camino 13597232 13674444 -0.565
>Cyberduck 58320 59608 -2.16

Last time I measured this
<2004Aug2...@mips.complang.tuwien.ac.at>, this was the result:

.text
section
33744 gzip_1.3.5-9_m68k.deb
39040 gzip_1.3.5-9_i386.deb
43940 gzip_1.3.5-9_sparc.deb
46044 gzip_1.3.5-9_arm.deb
48672 gzip_1.3.5-9_powerpc.deb
51468 gzip_1.3.5-9_hppa.deb
58864 gzip_1.3.5-9_alpha.deb
71872 gzip_1.3.5-9_mipsel.deb
71904 gzip_1.3.5-9_mips.deb
88256 gzip_1.3.5-9_ia64.deb

MitchAlsup

unread,

Dec 7, 2007, 1:20:47 PM12/7/07

to

In particular, the AMD processors do not translate the x86 codes, but
cache 'decorated' versions of them directly. P4 did translated
opcodes, but even this path is not longer being followed.

x86 processors are micro-architected with data paths that directly
deal with the vast majority of x86-isms in a moderate length pipeline
based on RISC-styled principles. But, for example, the address
generation unit is configured to perform a x86 memory address
calculation (less segmentation) in a single cycle and with
segmentation in one additional cycle. Thus, one can add a 16-bit, 32-
bit, or 64 bit register operand, a scaled 16-bit, 32-bit, or 64-bit
register operand, and a 8-bit, 16-bit or 32-bit displacement immediate
in a single cycle, giving a 16-bit, 20-bit, 32-bit, or 64-bit linear
address. So, while the underlying principles are RISC-like, the actual
execution is far from RISC-like.

Over in the integer side of things, a single opcode in a reservation
station could cause a memory operation to access memory, take the
resulting memory bit pattern and perform a integer operation upon it,
and then send it back to the memory system {e.g. ADD [B+I<<s+D],reg}.
Even integer division(s) in the most modern AMD chips operates without
microcode. FP, SSE, and special but non-microcoded instructions just
have less harry decompositions. Thus, in AMD processors, it is the
reservation stations that do much of the decomposition (n.e.
translation) of x86 into RISC-like pipelined operations. I strongly
suspect that the P3 derivatives do likewise. Finally, there is always
microcode in case the processor microarchitects can't figure out how
to decompose and pipeline a harry instruction or context changing
activity.

RISC -- no definately not; RISC-like - yes definately

Mitch

Anton Ertl

unread,

Dec 7, 2007, 1:24:47 PM12/7/07

to

Stephen Fuld <S.F...@PleaseRemove.att.net> writes:

>Anton Ertl wrote:
>> Concerning the high-bandwidth RAM, a possible solution would be to
>> have one part of the RAM (say, 1GB) in the form of high-bandwidth RAM
>> soldered on the motherboard, as is done now for GPUs, and the rest
>> expandable in slots, as is done now in CPUs. That approach can be
>> used to deal with both the bandwidth problem of many-core CPUs and of
>> integrated CPU/GPU combos. I guess there would be some swapping
>> mechanism between the high-bandwidth and the low-bandwidth RAM.
>
>Isn't there a pin count issue? If you have a separate GPU chip,it has
>lots of pins dedicated to the high bandwidth RAM interface. If you
>integrate the GPU onto the CPU die, you lose the separate pins of the
>separate GPU chip.

But with separate packages you also need pins on each package for
talking to the other package and its RAM. That's probably the reason
that the Niagara does not support multi-socket systems: They would
have had to reduce the number of RAM channels of the Niagara chip.

At the end of the day the amount of bandwidth you can get is limited
by the number of pins, so dividing the PUs into multiple packages does
not necessarily help. It will help, if there is exploitable locality
in the RAM access. NUMA systems and the division between CPU and GPU
RAM indicate that there is such locality, but OTOH the choppiness I
see when a game shows new scenery indicates that this does not work as
well as one might hope.

So while high-end systems will still use separate packages for quite a
long time to come, a single package with a high-bandwidth interface to
RAM will be good enough for the mid-range and the higher integration
will provide some additional advantages.

They will kill the market for discrete graphics with 32-bit and 64-bit
memory interfaces, and probably also 128-bit memory interfaces, just
as the motherboard graphics with separate graphics memory never
meterialized (apart from server boards), even though the capability is
there in some chipsets.

To really reap the benefits, I guess we will see all the memory
soldered on the board, not a fast-fixed+slow-expandable version like
you cited above. Those who want it expandable (i.e., servers) will
eventually go with something like FB-DIMMs, as that's the only way to
provide high bandwidth/pin and still be expandable.

Chris Thomasson

unread,

Dec 7, 2007, 3:46:22 PM12/7/07

to

"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
news:2007Dec...@mips.complang.tuwien.ac.at...

> Hilal <mont...@gmail.com> writes:
>>On Dec 6, 1:25 am, j...@cix.co.uk (John Dallman) wrote:
>>> > I want to propose another view to this opportunity. Why not put in
>>> > two cores that use two different ISA?
>>>
>>> How does this help with FSB congestion?
>>>
>>
>>It dose not, but it will avoid it, because probably the GPU will stay
>>where it is today with its very fast RAM instead of being inside and
>>using the relatively slow main memory.

[...]

I was wondering when the chip vendors are going to start creating their own
memory that will be directly integrated into their overall processor
designs.

http://groups.google.com/group/comp.arch/msg/499b4f64ee887b72

Is that at all feasible?

John Dallman

unread,

Dec 7, 2007, 4:16:00 PM12/7/07

to

In article <1MmdnRbg9rwV7sXa...@pipex.net>,

ken...@cix.compulink.co.uk () wrote:
> andrew-...@areilly.bpc-users.org (Andrew Reilly) wrote:
> > Back in the day, BBC micros sometimes ran with

> > different-architecture co-processors. I don't know how shared the
> > OS environment was.
> From memory the (BBC Micro) Z80 co-processor ran CPM on it's own

> memory (up to 64K) with the rest of the machine being reduced to
> an I/O processor.

That was the case for all of them. There was an interface "The Acorn
Tube" which went through several software iterations before it finally
worked. Acorn definitely sold co-processors with the 6502, and the
NS32016. I'm not sure if they sold a Z80 one in the end. They did plan
to do so, but the Torch Z80 pack came out first. That plugged into the
Tube interface, but used a quite different protocol with most of the
CP/M emulation in a ROM on the BBC side, to free up RAM on the Z80 side.

Back in the eighties, I worked on the Robo BitStik CAD system, which
mainly ran on Apple ][ hardware, but there was a version for the BBC
Micro. Acorn paid for the porting and bought an initial consignment of
the special three-axis joystick, but they never got anywhere much with
selling it. There wasn't much market amongst BBC Micro users for a CAD
system that /required/ a second processor, and whose software cost more
than the hardware.

John Dallman

unread,

Dec 7, 2007, 4:16:00 PM12/7/07

to

In article
<efcc7d96-384f-4f68...@b40g2000prf.googlegroups.com>,
already...@yahoo.com () wrote:

> On Dec 6, 12:06 pm, Hilal <montha...@gmail.com> wrote:
> > But yeah, you raised a lot of interesting comments, and maybe this
> > idea is shadowed by the failure of Itanium too much?
> His main point has little to do with Itanium*.

Indeed, but Itanium's lack of overwhelming success is likely to make
Intel very cautious about introducing another "revolutionary!" ISA.

> * I wouldn't call Itanium the failure. It's just not such a huge
> success as many hype-lovers predicted 10 years ago.

It has successfully replaced PA-RISC, although it failed to prevent x86
(Windows and Linux) from taking over the workstation market. However,
that is about as far as Itanium has got. The big SGI systems have a lot
of raw power, but they don't seem to be selling many of them now. And HP
have a big problem if Intel ever gets fed up of Itanium.

John Dallman

unread,

Dec 7, 2007, 4:16:00 PM12/7/07

to

In article <op.t2wpq...@khagan.ttx>, K.H...@thermoteknix.co.uk
(Ken Hagan) wrote:

> I suppose one could make the argument that this *has* been thought
> about. It has been possible for a long time to build a system with
> an x86 CPU and a different one. I believe Apple even did so, with
> the x86 on some kind of add-in card.

There have been quite a few. There was a Z-80 CP/M card for the Apple ][
series, which may actually have been the single largest-selling CP/M
hardware platform. There was an external x86 add-on for early Macs,
"MacCharlie", and multiple cards from various vendors once Macs acquired
expansion slots. All have passed away.

The OP's point, which has some validity, is that the currently
embarrassingly large transistor budgets of modern processors allow you
to have two CPU cores with different ISA in the same socket, which is
much cheaper to produce than an add-in card. However, that isn't
obviously enough for the idea to succeed: it just exposes more subtle
problems.

Thomas Womack

unread,

Dec 7, 2007, 4:30:08 PM12/7/07

to

In article <2007Dec...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>But with separate packages you also need pins on each package for
>talking to the other package and its RAM. That's probably the reason
>that the Niagara does not support multi-socket systems: They would
>have had to reduce the number of RAM channels of the Niagara chip.

I went to an interesting conference on cryptographic computing this
autumn in Vienna, 'SHARCS 2007', where James Hughes from Sun presented
'Victoria Falls'. This is a system with either two or four modified
Niagara-2 processors, with the modification consisting of replacing
the 10Gb-ethernet hardware with a single very-high-speed coherency
link. Still has 16 FBDIMMs per processor (four on each of the memory
channels); for the four-socket system there's a big router ASIC
through which all communications go.

The chips are packaged in 1831-pin BGAs, which is about as many pins
as can be arranged without heroic Power4-like measures.

Tom

Hilal

unread,

Dec 7, 2007, 4:45:38 PM12/7/07

to

On Dec 7, 10:16 pm, j...@cix.co.uk (John Dallman) wrote:
> The OP's point, which has some validity, is that the currently
> embarrassingly large transistor budgets of modern processors allow you
> to have two CPU cores with different ISA in the same socket, which is
> much cheaper to produce than an add-in card. However, that isn't
> obviously enough for the idea to succeed: it just exposes more subtle
> problems.
>
> --
> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

To me it seemed as a very simple solution and to good to be true,
thanks to you all I have a bigger picture and hopefully closer to
reality, if we can ever come there :D

I will keep this floating in my mind and hope that we can together
think of something good.
Solutions don't need to be complex and we tend to forget that, thats
what my professors said to me once.

Another question, dose Intel own their ISA? I mean can anyone make
their own CPU that run the x86 code?

thank you.

Paul Gotch

unread,

Dec 7, 2007, 5:37:01 PM12/7/07

to

Chris Thomasson <cri...@comcast.net> wrote:
> I was wondering when the chip vendors are going to start creating their own
> memory that will be directly integrated into their overall processor
> designs.

There are several issues, firstly the kind of process that you need for
logic and DRAM is quite different and you can't get SRAM up to the densities
required. IBM are probably at edge of the curve here, for example they use
an eDRAM for their vast L3 caches on POWER MCMs.

The second problem is that programming NUMA machines is hard with languages
designed for coherent shared memory machines. Again IBM have the cell
processor and possibly someone will come up with a novel way of programming
it one of these days. Personally I think something inspired by OCaml or
Haskell is the way forward.

-p
--
"Unix is user friendly, it's just picky about who its friends are."
- Anonymous
--------------------------------------------------------------------

Chris Thomasson

unread,

Dec 7, 2007, 6:09:07 PM12/7/07

to

"Paul Gotch" <pa...@at-cantab-dot.net> wrote in message
news:jCs*dl...@news.chiark.greenend.org.uk...

> Chris Thomasson <cri...@comcast.net> wrote:
>> I was wondering when the chip vendors are going to start creating their
>> own
>> memory that will be directly integrated into their overall processor
>> designs.
>
> There are several issues,

[...]

>
> The second problem is that programming NUMA machines is hard with
> languages

> designed for coherent shared memory machines. [...]

IMHO, its not that hard. You need to learn how to use thread affinity masks
to bind them to CPUS that have a high locality of reference wrt the distance
between them and the local memory they have access to. You can use C/POSIX,
and some platform-specific/assembly-language to program highly efficient
NUMA programs.

Stephen Sprunk

unread,

Dec 7, 2007, 6:13:30 PM12/7/07

to

"John Dallman" <j...@cix.co.uk> wrote in message
news:memo.2007120...@jgd.compulink.co.uk...

> In article
> <efcc7d96-384f-4f68...@b40g2000prf.googlegroups.com>,
> already...@yahoo.com () wrote:
>> On Dec 6, 12:06 pm, Hilal <montha...@gmail.com> wrote:
>> > But yeah, you raised a lot of interesting comments, and maybe this
>> > idea is shadowed by the failure of Itanium too much?
>>
>> His main point has little to do with Itanium*.
>
> Indeed, but Itanium's lack of overwhelming success is likely to make
> Intel very cautious about introducing another "revolutionary!" ISA.

The last dozen failures at it didn't stop them from building Itanic...

>> * I wouldn't call Itanium the failure. It's just not such a huge
>> success as many hype-lovers predicted 10 years ago.
>
> It has successfully replaced PA-RISC,

Only because PA-RISC went away and customers had to buy _something_, and
even then many of them defected to x86 after learning the hard way about
getting themselves locked into proprietary systems. Due to the competitive
market, x86 will never go away as long as customers keep paying for it; if
any player tries to leave, someone else will step in to take the money and
new players will emerge.

> although it failed to prevent x86 (Windows and Linux) from taking over the
> workstation market. However, that is about as far as Itanium has got. The
> big
> SGI systems have a lot of raw power, but they don't seem to be selling
> many
> of them now.

SGI hasn't sold many of _anything_ for a long time; they've still got
top-notch engineers, but their sales and management leave a lot to be
desired.

> And HP have a big problem if Intel ever gets fed up of Itanium.

No, they'll just transition to x86 and use AMD chips exclusively for a while
(to get back at Intel for abandoning them, though it won't last). HP has
learned how to handle an ISA transition, and I'm sure they'd also study up
on how Apple handled things after Motorola and then IBM decided not to build
the PPC products they were depending on.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

del

unread,

Dec 8, 2007, 10:26:38 PM12/8/07

to

Stephen Sprunk wrote:
>I'm sure they'd also
> study up on how Apple handled things after Motorola and then IBM decided
> not to build the PPC products they were depending on.
>
> S
>

This is a total distortion of the way I heard it. The way I heard it is
IBM decided not to build the PPC products after Apple decided not to pay
the necessary number of dollars for the chips and the development
thereof.

After all, IBM was willing to do chips for Sony, and Microsoft

del

unread,

Dec 8, 2007, 10:29:48 PM12/8/07

to

The answer to your first question is "never" since commodity memory is
so much cheaper. A quibble is "start" should be "go back to" since IBM
at least used to do exactly that.

Stephen Fuld

unread,

Dec 7, 2007, 11:11:48 PM12/7/07

to

Anton Ertl wrote:
> Stephen Fuld <S.F...@PleaseRemove.att.net> writes:
>> Anton Ertl wrote:
>>> Concerning the high-bandwidth RAM, a possible solution would be to
>>> have one part of the RAM (say, 1GB) in the form of high-bandwidth RAM
>>> soldered on the motherboard, as is done now for GPUs, and the rest
>>> expandable in slots, as is done now in CPUs. That approach can be
>>> used to deal with both the bandwidth problem of many-core CPUs and of
>>> integrated CPU/GPU combos. I guess there would be some swapping
>>> mechanism between the high-bandwidth and the low-bandwidth RAM.
>> Isn't there a pin count issue? If you have a separate GPU chip,it has
>> lots of pins dedicated to the high bandwidth RAM interface. If you
>> integrate the GPU onto the CPU die, you lose the separate pins of the
>> separate GPU chip.
>
> But with separate packages you also need pins on each package for
> talking to the other package and its RAM.

Yes, but at least on current Intel systems, the GPU is hung off the MCM
chips (also known as the north bridge) and so takes no additional pins
on the CPU. Now certainly the CPU-GPU traffic uses the CPU FSB but
presumably the benefits of having additional pins on the GPU that are
dedicated to its memory outweighs the cost to the CPU FSB of the CPU=GPU
traffic.

snip

> At the end of the day the amount of bandwidth you can get is limited
> by the number of pins, so dividing the PUs into multiple packages does
> not necessarily help.

Right. As you say below, the answer is the answer to most computer
architecture questions - "It depends!" :-)

> It will help, if there is exploitable locality
> in the RAM access. NUMA systems and the division between CPU and GPU
> RAM indicate that there is such locality,

Yes.

> but OTOH the choppiness I
> see when a game shows new scenery indicates that this does not work as
> well as one might hope.

I am not a gamer so I don't encounter this, though I don't doubt that
some people do. I think you are right that it indicates that things are
not as good as one might hope, but it doesn't indicate that a different
topology would be better.

> So while high-end systems will still use separate packages for quite a
> long time to come, a single package with a high-bandwidth interface to
> RAM will be good enough for the mid-range and the higher integration
> will provide some additional advantages.

Intel offers "embedded graphics" that, as I understand it, does this.
But as you say, it is only for lower to perhaps mid range systems.
Indeed, if there is some new high-bandwidth interface available, it
isn't clear (at least to me) that it wouldn't be better to still have a
separate GPU chip that also uses this new interface to speed up its
operations to its local memory.

> They will kill the market for discrete graphics with 32-bit and 64-bit
> memory interfaces, and probably also 128-bit memory interfaces, just
> as the motherboard graphics with separate graphics memory never
> meterialized (apart from server boards), even though the capability is
> there in some chipsets.

While you could be right, I fall back on "It depends!" :-)

>
> To really reap the benefits, I guess we will see all the memory
> soldered on the board, not a fast-fixed+slow-expandable version like
> you cited above.

I didn't cite this. In fact, I worked on a system that had two
different speeds of memory, each separately addressable. It sure was a
mess! :-(

mike3

unread,

Dec 8, 2007, 12:14:42 AM12/8/07

to

On Dec 7, 2:16 pm, j...@cix.co.uk (John Dallman) wrote:
> In article <op.t2wpq0rbm60...@khagan.ttx>, K.Ha...@thermoteknix.co.uk

>
> (Ken Hagan) wrote:
> > I suppose one could make the argument that this *has* been thought
> > about. It has been possible for a long time to build a system with
> > an x86 CPU and a different one. I believe Apple even did so, with
> > the x86 on some kind of add-in card.
>
> There have been quite a few. There was a Z-80 CP/M card for the Apple ][
> series, which may actually have been the single largest-selling CP/M
> hardware platform. There was an external x86 add-on for early Macs,
> "MacCharlie", and multiple cards from various vendors once Macs acquired
> expansion slots. All have passed away.
>
> The OP's point, which has some validity, is that the currently
> embarrassingly large transistor budgets of modern processors allow you
> to have two CPU cores with different ISA in the same socket, which is
> much cheaper to produce than an add-in card. However, that isn't
> obviously enough for the idea to succeed: it just exposes more subtle
> problems.
>

But why can't the problems be overcome? Does this mean there is no
way to dislodge x86 from the desktop?

Niels Jørgen Kruse

unread,

Dec 8, 2007, 4:15:50 AM12/8/07

to

del <delc...@gmail.com> wrote:

> Stephen Sprunk wrote:
> >I'm sure they'd also
> > study up on how Apple handled things after Motorola and then IBM decided
> > not to build the PPC products they were depending on.
> >
> > S
> >
> This is a total distortion of the way I heard it. The way I heard it is
> IBM decided not to build the PPC products after Apple decided not to pay
> the necessary number of dollars for the chips and the development
> thereof.

Apple decided Intels offer was better and IBM overestimated how hard it
would be for Apple to switch.

Anton Ertl

unread,

Dec 8, 2007, 5:26:27 AM12/8/07

to

del <delc...@gmail.com> writes:
>Stephen Sprunk wrote:
>>I'm sure they'd also
>> study up on how Apple handled things after Motorola and then IBM decided
>> not to build the PPC products they were depending on.
>>
>> S
>>
>This is a total distortion of the way I heard it. The way I heard it is
>IBM decided not to build the PPC products after Apple decided not to pay
> the necessary number of dollars for the chips and the development
>thereof.

Also, IIRC Motorola released some new CPU revision that would have
been a good fit for the iBook even after Apple announced the switch
(sorry, I don't remember the details). And then there is PA Semi. So
it seems that the first decision in this area was Apple's.

Anders....@kapsi.spam.stop.fi.invalid

unread,

Dec 8, 2007, 5:50:39 AM12/8/07

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

> Also, IIRC Motorola released some new CPU revision that would have
> been a good fit for the iBook even after Apple announced the switch

Freescale announced the e600-based CPUs around that time, but they
weren't actually released until this year, I think. Of course Apple's
switch may have significantly affected the development of these chips.

-Anders

ned...@yahoo.com

unread,

Dec 8, 2007, 6:37:37 AM12/8/07

to

Hello,

mike3 wrote:
> But why can't the problems be overcome? Does this mean there is no
> way to dislodge x86 from the desktop?

Why would you want to dislodge x86? ISA doesn't matter, except in
extremely low power applications (and Intel is starting to push into
that with Silverthorne). Plus the development tools for x86 make it
very attractive.

Ned

Anton Ertl

unread,

Dec 8, 2007, 5:49:19 AM12/8/07

to

Stephen Fuld <S.F...@PleaseRemove.att.net> writes:
>Anton Ertl wrote:
>> But with separate packages you also need pins on each package for
>> talking to the other package and its RAM.
>
>Yes, but at least on current Intel systems, the GPU is hung off the MCM
>chips (also known as the north bridge)

You mean MCH (memory controller hub), not MCM (multi-chip module).

>and so takes no additional pins
>on the CPU.

But the PCIe interface takes additional pins on the GPU, and the GPU
is the chip that has the really high memory bandwidth (up to 512 bits
wide, resulting in up to 128GB/s bandwidth). Also, the MCH needs to
talk to the CPU, the GPU, the south bridge, and memory, so for a given
number of pins fewer pins are left for memory (this might be
alleviated by needing fewer power pins for the MCH than for the CPU).

>> but OTOH the choppiness I
>> see when a game shows new scenery indicates that this does not work as
>> well as one might hope.
>
>I am not a gamer so I don't encounter this, though I don't doubt that
>some people do. I think you are right that it indicates that things are
>not as good as one might hope, but it doesn't indicate that a different
>topology would be better.

Well, a unified memory architecture with the combined bandwidth of CPU
and GPU should be better, no? Of course, the question is if that is
possible for the same cost.

>> So while high-end systems will still use separate packages for quite a
>> long time to come, a single package with a high-bandwidth interface to
>> RAM will be good enough for the mid-range and the higher integration
>> will provide some additional advantages.
>
>Intel offers "embedded graphics" that, as I understand it, does this.

Do you mean integrated graphics, i.e., graphics integrated on the
motherboard chipset? Every chipset vendor offers that: Intel,
AMD/ATI, Nvidia, VIA. As I mentioned, some of these chips have (had?)
the option of adding some local GPU memory, but motherboard
manufacturers did not take advantage of that.

>But as you say, it is only for lower to perhaps mid range systems.

Quite low end at the moment. They have not killed the market for
discrete low-end graphics cards yet.

While I'm at discussing memory bandwidth and GPUs: The width was a
significant indicator of performance; in particular, many 128-bit
cards were also offered with 64-bit memory interfaces (sometimes with
an SE suffix, sometimes without), and the performance sucked.

However, then the following puzzling thing happened: That latest
high-end graphics cards have 384 bit (Nvidia 8800GTX) and 512 bit (ATI
2900XT) wide memory interfaces. Recently both companies presented
graphics cards in the $200 range that have similar GPU chips, but only
256-bit wide memory interfaces. Yet, the performance was mostly on
par with the high-end cards. Why are these cards not as sensitive to
memory bandwidth as the lower-end cards, and why did their
manufacturers implement the wider interfaces originally? One thing I
read is that for the ATI chip the reduction in bit width was achieved
by having 8 32-bit channels instead of 8 64-bit ones, so maybe the
number of channels is more important than their width.

Erik Trulsson

unread,

Dec 8, 2007, 7:33:20 AM12/8/07

to

Although the x86 ISA is ugly and has several flaws, it actually does not
constitute a major bottleneck for performance. It is more difficult to
write a good compiler for x86 than it is for a RISC CPU - but by now
the compiler writers have learned all the tricks to generate good code
for x86.
Going to some better ISA would only give minor performance benefits - even
assuming you could match Intel's and AMD's process technology in other aspects.
To keep this other ISA being faster than x86, you will also have to match Intel's
and AMD's R&D to keep up with their improvements in process technology - or persuade
both of them to switch to this new ISA. Intel tried switching ISA recently (to Itanium).
One reason why they failed was because AMD kept improving their x86
offerings meanwhile, meaning customers didn't have any reason to stop using x86.

The main reason x86 currently is so successful on the desktop has nothing to
do with performance. It is because there is such a large number of
applications that are only available for x86 (mostly running Microsoft Windows.)

To dislodge the x86 from the desktop you will need some way to have all these
applications (or superior ones) run (and run well) on whatever system is supposed
to replace x86.

--
<Insert your favourite quote here.>
Erik Trulsson
ertr...@student.uu.se

Paul Gotch

unread,

Dec 8, 2007, 8:22:33 AM12/8/07

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> (sorry, I don't remember the details). And then there is PA Semi.

Which would not have been able to guarantee the volumes that Intel can.
There was more to Apple's decision than just which architecture is best.

> it seems that the first decision in this area was Apple's.

IBM could not produce a PPC970 class processor that had suitable thermals
for notebooks and the Motorola 7xxx series had run out of steam. Apple had
to do something and the only sensible way out where they could get chips
instantly instead of perhaps in a couple of years time was Intel.

Hilal

unread,

Dec 8, 2007, 8:55:28 AM12/8/07

to

On Dec 8, 1:33 pm, Erik Trulsson <ertr1...@student.uu.se> wrote:
> Going to some better ISA would only give minor performance benefits - even
> assuming you could match Intel's and AMD's process technology in other aspects.
> To keep this other ISA being faster than x86, you will also have to match Intel's
> and AMD's R&D to keep up with their improvements in process technology - or persuade
> both of them to switch to this new ISA. Intel tried switching ISA recently (to Itanium).
> One reason why they failed was because AMD kept improving their x86
> offerings meanwhile, meaning customers didn't have any reason to stop using x86.
>

Itanium wasn't supposed to be a desktop PC, because translation was
slow in the beginning, which is solved by this solution.
The biggest question that came up here is in what way will this new
ISA be better?

1: will it really give performance compared to x86?
2: Can we make it fast enough as x86 but much simpler so that we can
have a combination: MORE cores, BIGGER caches, MANY registers?

Also can we find a new way to handle data than the undeterministic
cache, like creating a local cache inside the cache, where programs
can use it as main memory?

Or we can use this big caches for fast context switching?

there are endless opportunities I see here! By making one thing
simpler and expanding on the things where it counts. Apparently many
of you think that ISA is not the biggest bottleneck, and I don't
disagree, but then what if we made it a bit "slower" and used the area
we get boost something else? And as the same time we are fazing out
x86!

> The main reason x86 currently is so successful on the desktop has nothing to
> do with performance. It is because there is such a large number of
> applications that are only available for x86 (mostly running Microsoft Windows.)
>
> To dislodge the x86 from the desktop you will need some way to have all these
> applications (or superior ones) run (and run well) on whatever system is supposed
> to replace x86.
>

Thats the idea, you don't need to recompile everything! It is like
running 32bit in 64bit. Recompile only what you need, this will give
it a dedicated CORE!!!
The problem is in the proprietary software you need to wait for them,
while open source will recompile much faster.

> --
> <Insert your favourite quote here.>
> Erik Trulsson

> ertr1...@student.uu.se

best regards,

del

unread,

Dec 9, 2007, 9:21:43 AM12/9/07

to

Paul Gotch wrote:
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> (sorry, I don't remember the details). And then there is PA Semi.
>
> Which would not have been able to guarantee the volumes that Intel can.
> There was more to Apple's decision than just which architecture is best.
>
>> it seems that the first decision in this area was Apple's.
>
> IBM could not produce a PPC970 class processor that had suitable thermals
> for notebooks and the Motorola 7xxx series had run out of steam. Apple had
> to do something and the only sensible way out where they could get chips
> instantly instead of perhaps in a couple of years time was Intel.
>

> -p.
That is silly. "IBM could not..."? I would believe didn't or wouldn't
but couldn't? Caught in the jobs RDF?

John Dallman

unread,

Dec 8, 2007, 9:28:00 AM12/8/07

to

In article
<170f7ab6-4b0d-4f2c...@o42g2000hsc.googlegroups.com>,
mont...@gmail.com (Hilal) wrote:

> Solutions don't need to be complex and we tend to forget that, thats
> what my professors said to me once.

There is the question of appropriate complexity. A processor that just
executes one instruction at a time, with no pipelining, and deals
directly with memory, with no caches, is hopelessly slow with current
fabrication technologies. So you need pipelines and caches, which add a
lot of complexity, or you need to invent a better method of making
circuits. And if you do the latter, the guys with the existing complex
designs will try very hard to take advantage of it.

Try an analogy. Your university's engineering department could
undoubtedly build a gas turbine engine that was as good as an early
Whittle or Junkers design, for much less money than those projects took
(in real terms) because of the amount of knowledge on the subject that
is now readily accessible. That does not mean, however, that they could
build an engine that is as good as Pratt & Whitney, Roll-Royce or
General Electric could manage today. At least, not without spending the
same kind of money, and building up equivalent facilities and skills.
Yet those engines work in basically the same way. See the analogy?

Twenty-five years ago, it was possible to build a competitive general-
purpose CPU really quite simply and cheaply. MIPS, SPARC and so on
demonstrated this. Now, although ISAs have not changed very much, the
CPUs that run them have been developed vastly, and the price of entry to
the market has risen with the state of development. There are fewer CPU
manufacturers than there used to be, and attempts to enter the market
are normally aimed at a specific and specialised niche.

So if you want to change the market, you need a significantly different
principle. The aero engine business makes a useful analogy again: in
1935, there were far fewer makers of piston engines than there had been
in 1910, and the engines were highly developed, making it hard for a new
company to compete. The gas turbine provided a new principle, which
allowed fairly simple designs to exceed the performance of highly
developed old-style engines.

I'm suspicious - although I don't know for sure, of course - that
overturning the currently dominant players in the CPU field requires
some change that is more drastic than just an ISA that is "cleaner", or
"better" in some other way while remaining within the same general style.

What do I mean by general style? Probably the Von Neuman machine with
instructions that execute serially, at least in concept. Remember that
the point of the Von Neuman machine - which is a practical way of
creating a Turing machines - is that it is simple to analyse, build and
test. The EDGE/TRIPS project looks like at least an attempt to move
beyond that model, in using parallelism without threading. I have no
idea if it's practical, or if it's sufficiently different to fulfil its
objectives. And commercial success is another matter entirely.

> Another question, does Intel own their ISA? I mean can anyone make

> their own CPU that run the x86 code?

There seem to be various patents and ownership issues that make it
complicated. AMD and Intel may be deadly enemies for marketing purposes,
but they have quite a bit of cross-licensing too.

John Dallman

unread,

Dec 8, 2007, 10:02:00 AM12/8/07

to

In article <5rvnlaF...@mid.individual.net>, delc...@gmail.com
(del) wrote:

> Paul Gotch wrote:
> > IBM could not produce a PPC970 class processor that had suitable
> > thermals for notebooks

> That is silly. "IBM could not..."? I would believe didn't or
> wouldn't but couldn't?

I'd believe that IBM could not without doing a bunch of specialised
development, and would not do that for a price that Apple would pay.
Low-power versions of CPUs seem to have a fair few tricks of their own.
When Transmeta appeared, Intel and AMD started taking power usage much
more seriously, but it still took them a year or more to make serious
progress.

John Dallman

unread,

Dec 8, 2007, 10:02:00 AM12/8/07

to

In article
<423b06de-7c84-4118...@a35g2000prf.googlegroups.com>,
mont...@gmail.com (Hilal) wrote:

> Itanium wasn't supposed to be a desktop PC

Yes, it was. Intel didn't hand out 16,000 development machines, costing
maybe $10,000 each after the CPU and motherboard upgrades they had to go
through, for a platform that was intended for specialised servers.

> The biggest question that came up here is in what way will this new
> ISA be better?
>
> 1: will it really give performance compared to x86?
> 2: Can we make it fast enough as x86 but much simpler so that we can
> have a combination: MORE cores, BIGGER caches, MANY registers?

The only way to demonstrate that this is possible is to demonstrate an
example. Go ahead. You need to convince someone to invest a billion
dollars. They are likely to be much harder to convince than a newsgroup:
we can be interested by mere technical advances, but you are failing to
do that.

> Also can we find a new way to handle data than the undeterministic
> cache, like creating a local cache inside the cache, where programs
> can use it as main memory?

Try explaining this again? I don't understand it.

> Or we can use this big caches for fast context switching?

Context switching is not, in fact, a big problem on a modern general-
purpose system. Context switches are rare enough, in terms of the
numbers of instructions executed that they aren't a big overhead in
terms of time any more. You need to remember that you're talking about
designing systems that do thousands of VAX MIPS these days; some things
are different.

> there are endless opportunities I see here! By making one thing
> simpler and expanding on the things where it counts. Apparently many
> of you think that ISA is not the biggest bottleneck, and I don't
> disagree, but then what if we made it a bit "slower" and used the area
> we get boost something else?

You need to demonstrate the benefits, not just talk about them. It's all
much harder than you seem to think. What you're saying would have made
perfect sense twenty years ago, but things have changed. This is one of
the problems with the teaching of computer science these days: the
commercial market moves too fast for students to stay current.

> And as the same time we are fazing out x86!

This will not happen without substantial side-benefits. It is not, in
fact, a practical and virtuous thing in itself. If you feel that it is,
you aren't understanding how the market works.

Paul Gotch

unread,

Dec 8, 2007, 1:28:40 PM12/8/07

to

John Dallman <j...@cix.co.uk> wrote:
> I'd believe that IBM could not without doing a bunch of specialised
> development, and would not do that for a price that Apple would pay.

Indeed IBMs methadology is heavily based round a standard cell library as
much automation as they can get away with. So they trade some performance and
power efficency for fewer engineers and faster turnaround time.

Getting a really low power chip requires full custom layout and and tight
process. Or something semicustom and not mainstream like
http://www.intrinsity.com/ the headroom you get out of using something like
Fast14 does depend critically on exactly how the core was designed to start
off with though.

Stephen Fuld

unread,

Dec 8, 2007, 1:38:19 PM12/8/07

to

Anton Ertl wrote:
> Stephen Fuld <S.F...@PleaseRemove.att.net> writes:
>> Anton Ertl wrote:
>>> But with separate packages you also need pins on each package for
>>> talking to the other package and its RAM.
>> Yes, but at least on current Intel systems, the GPU is hung off the MCM
>> chips (also known as the north bridge)
>
> You mean MCH (memory controller hub), not MCM (multi-chip module).

Yes, of course. Thanks for the correction.

>> and so takes no additional pins
>> on the CPU.
>
> But the PCIe interface takes additional pins on the GPU, and the GPU
> is the chip that has the really high memory bandwidth (up to 512 bits
> wide, resulting in up to 128GB/s bandwidth). Also, the MCH needs to
> talk to the CPU, the GPU, the south bridge, and memory, so for a given
> number of pins fewer pins are left for memory (this might be
> alleviated by needing fewer power pins for the MCH than for the CPU).

That is the assumption I was making. The presumption is that the
reduction in GPU pin count due to the PCI-E interface costs (performance
cost) less than the reduction in the total system pin count that would
occur if we eliminated the extra memory interface pins on the GPU.

>>> but OTOH the choppiness I
>>> see when a game shows new scenery indicates that this does not work as
>>> well as one might hope.
>> I am not a gamer so I don't encounter this, though I don't doubt that
>> some people do. I think you are right that it indicates that things are
>> not as good as one might hope, but it doesn't indicate that a different
>> topology would be better.
>
> Well, a unified memory architecture with the combined bandwidth of CPU
> and GPU should be better, no? Of course, the question is if that is
> possible for the same cost.

Right. I presume it isn't. As you said, the metric is essentially
number of pins and the huge number of "extra" memory interface pins
provided by the GPU seems to be the major factor. If you assume that
the demand for memory bandwidth is infinite, and is limited by the
number of pins, then, as long as the PCI-E interface isn't a limiting
factor (I guess that s equivalent to saying the demand for memory
bandwidth is infinite), then a separate chip, with its extra pins will
always be a winner. But see below.

>
>>> So while high-end systems will still use separate packages for quite a
>>> long time to come, a single package with a high-bandwidth interface to
>>> RAM will be good enough for the mid-range and the higher integration
>>> will provide some additional advantages.
>> Intel offers "embedded graphics" that, as I understand it, does this.
>
> Do you mean integrated graphics, i.e., graphics integrated on the
> motherboard chipset?

Yes. Again, sorry for the error.

> Every chipset vendor offers that: Intel,
> AMD/ATI, Nvidia, VIA. As I mentioned, some of these chips have (had?)
> the option of adding some local GPU memory, but motherboard
> manufacturers did not take advantage of that.
>
>> But as you say, it is only for lower to perhaps mid range systems.
>
> Quite low end at the moment. They have not killed the market for
> discrete low-end graphics cards yet.

Right.

> While I'm at discussing memory bandwidth and GPUs: The width was a
> significant indicator of performance; in particular, many 128-bit
> cards were also offered with 64-bit memory interfaces (sometimes with
> an SE suffix, sometimes without), and the performance sucked.
>
> However, then the following puzzling thing happened: That latest
> high-end graphics cards have 384 bit (Nvidia 8800GTX) and 512 bit (ATI
> 2900XT) wide memory interfaces. Recently both companies presented
> graphics cards in the $200 range that have similar GPU chips, but only
> 256-bit wide memory interfaces. Yet, the performance was mostly on
> par with the high-end cards. Why are these cards not as sensitive to
> memory bandwidth as the lower-end cards, and why did their
> manufacturers implement the wider interfaces originally? One thing I
> read is that for the ATI chip the reduction in bit width was achieved
> by having 8 32-bit channels instead of 8 64-bit ones, so maybe the
> number of channels is more important than their width.

Interesting. So latency counts here too. Also, perhaps they have
reached a limit (when using current CPUs, GPUs, PCI-e, etc.) of demand
for memory bandwidth at that which can be supplied by 256 pins. This is
certainly not my area of expertise, so I am just speculating here.

Paul Gotch

unread,

Dec 8, 2007, 1:39:06 PM12/8/07

to

John Dallman <j...@cix.co.uk> wrote:
> This will not happen without substantial side-benefits. It is not, in
> fact, a practical and virtuous thing in itself. If you feel that it is,
> you aren't understanding how the market works.

Personally I think the only thing that's going to get rid of x86 desktop PCs
is the desktop PC going the way of the dodo. This is not as far fetched as
you think. Staying in the x86 space the number of laptops shipped has been
increasing year on year for a while now.

Jumping out of the x86 space the number of feature and smart mobile phones
(which are really ARM based computers) shipped dwarfs the number of PCs
shipped.

Why do you think that Intel are trying to expand down and get into what they
term "Mobile Intenet Devices"? Incidently most of these things run some form
of customised Linux after it turned out that virtually no market existed for
Windows based "UMPCs".

Anne & Lynn Wheeler

unread,

Dec 8, 2007, 2:10:26 PM12/8/07

to

Paul Gotch <pa...@at-cantab-dot.net> writes:
> Indeed IBMs methadology is heavily based round a standard cell library as
> much automation as they can get away with. So they trade some performance and
> power efficency for fewer engineers and faster turnaround time.
>
> Getting a really low power chip requires full custom layout and and tight
> process. Or something semicustom and not mainstream like
> http://www.intrinsity.com/ the headroom you get out of using something like
> Fast14 does depend critically on exactly how the core was designed to start
> off with though.

in (at least) the 70s and 80s there were quite a few internal people
working on VLSI design tools. I saw some amount doing work with the Los
Gatos VLSI group. In fact, the Los Gatos VLSI tools group originally
created the corporate pascal compiler in support of their VLSI tool
development (which was eventually released as product on both mainframe
and RISC).

in the wake of the company going thru its downturn, including going into
the red in 1992 ... quite a bit of the internal tool groups were
dissolved and transition made to standard vendor tools. during this
period, there were some number of people from VSLI tool groups doing
technology transfer to outside vendors ... as sort of last task prior to
their position being eliminated (some then showed up at as employees of
the vendors they had been working with).

In the case of the Los Gatos group ... the bldg was leveled and the land
sold off for housing development.

John Dallman

unread,

Dec 8, 2007, 2:27:00 PM12/8/07

to

In article <w7d*YK...@news.chiark.greenend.org.uk>,

pa...@at-cantab-dot.net (Paul Gotch) wrote:
> Why do you think that Intel are trying to expand down and get into
> what they term "Mobile Intenet Devices"?

Not wanting to be left out of what might become a big market. But it's a
"might". Firefox et al are doing a good job of persuading commercial
websites to be more flexible about what browsers they support, but
there's still all this stuff in video codecs, flash players, .NET, Java,
etc, etc. that's already available on the current desktop/laptop
platforms, but getting it all onto J.Random MID could still be a
struggle.

Stephen Sprunk

unread,

Dec 8, 2007, 3:56:41 PM12/8/07

to

"del" <delc...@gmail.com> wrote in message
news:5ruh92F...@mid.individual.net...

> Stephen Sprunk wrote:
>> I'm sure they'd also study up on how Apple handled things after Motorola
>> and
>> then IBM decided not to build the PPC products they were depending on.
>

> This is a total distortion of the way I heard it. The way I heard it is
> IBM decided not to build the PPC products after Apple decided not to pay
> the necessary number of dollars for the chips and the development thereof.

I didn't state _why_ IBM and Motorola made those decisions; of course it had
to do with the money that would be made on the deal. Apple
wouldn't/couldn't pay them to make what they needed, so they didn't get
made. That doesn't change the point I was making that Apple had to do a
transition to x86, and HP will be studying that if they know what's good for
them long-term.

If HP paid Intel enough money, they'd not ditch Itanic. But HP can't afford
to keep it afloat on their own, so it's going to sink sooner or later and HP
will find itself in the same position as Apple did with PowerPC.

> After all, IBM was willing to do chips for Sony, and Microsoft

Sony and Microsoft have significantly higher volume for their respective
chips than Apple. They also have the advantage they just need a single chip
made exactly the same (well, lower prices over time would be nice) for
several years, and after that they iterate again on the next chip. Apple
needed continual performance increases with a dozen different models being
sold at any given time, which is a lot more expensive per-chip on the R&D
side.

Stephen Sprunk

unread,

Dec 8, 2007, 4:22:06 PM12/8/07

to

"Hilal" <mont...@gmail.com> wrote in message
news:423b06de-7c84-4118...@a35g2000prf.googlegroups.com...

> On Dec 8, 1:33 pm, Erik Trulsson <ertr1...@student.uu.se> wrote:
>> Going to some better ISA would only give minor performance
>> benefits - even assuming you could match Intel's and AMD's
>> process technology in other aspects. To keep this other ISA being
>> faster than x86, you will also have to match Intel's and AMD's R&D
>> to keep up with their improvements in process technology - or
>> persuade both of them to switch to this new ISA. Intel tried
>> switching ISA recently (to Itanium). One reason why they failed
>> was because AMD kept improving their x86 offerings meanwhile,
>> meaning customers didn't have any reason to stop using x86.
>
> Itanium wasn't supposed to be a desktop PC, because translation was
> slow in the beginning, which is solved by this solution.

Itanic _was_ supposed to be a total replacement for x86; Intel wanted to
ditch x86 because there was competition there keeping profits down, and they
wanted to lock everyone into a proprietary architecture so they could charge
more.

The reality is that once an open standard is developed and achieves market
dominance, market forces will ensure that a proprietary solution will
_never_ manage to unseat it, even if it's temporarily superior. x86 isn't
quite open, due to all the patent licensing, but there's enough suppliers
that it's close enough for the effect to apply.

> The biggest question that came up here is in what way will this new
> ISA be better?
>
> 1: will it really give performance compared to x86?

Unless you come up with some ground-breaking improvements in circuit design,
your competitive design will never beat x86 on performance per dollar. And
you can bet that as soon as you ship your first sample chips, AMD and Intel
will get some, copy your improvements, and apply them to their x86 chips --
and kill you on price because they have a much larger volume to spread their
R&D costs across.

> 2: Can we make it fast enough as x86 but much simpler so that we
> can have a combination: MORE cores, BIGGER caches, MANY
> registers?

There are already chips that beat x86 on all those factors; they lose in the
marketplace except when targeted at very specific niches. Nobody even tries
competing for the mass market anymore because it's financial suicide.

> there are endless opportunities I see here! By making one thing
> simpler and expanding on the things where it counts. Apparently
> many of you think that ISA is not the biggest bottleneck, and I don't
> disagree, but then what if we made it a bit "slower" and used the
> area we get boost something else? And as the same time we are
> fazing out x86!

You seem to think that phasing (not fazing) out x86 is a goal in itself, as
if x86 were somehow "bad". It's ugly, sure, but it's simply not important
anymore since the industry now knows how to work around its oddities to
build fast chips.

>> The main reason x86 currently is so successful on the desktop has
>> nothing to do with performance. It is because there is such a large
>> number of applications that are only available for x86 (mostly
>> running Microsoft Windows.)
>>
>> To dislodge the x86 from the desktop you will need some way to
>> have all these applications (or superior ones) run (and run well) on
>> whatever system is supposed to replace x86.
>
> Thats the idea, you don't need to recompile everything! It is like
> running 32bit in 64bit. Recompile only what you need, this will give
> it a dedicated CORE!!!
> The problem is in the proprietary software you need to wait for them,
> while open source will recompile much faster.

The problem with the idea of putting two dissimilar cores on your chip is
that it will be beaten (in either mode) on performance by chips with two x86
cores that cost less (due to less R&D and higher volume).

There was a time when the fastest x86 machine was actually an Alpha running
an x86 software emulator -- the Alpha was that much faster. However, very
few people bought Alphas for running x86 software because they cost more
(which was mostly a result of the low volume, a vicious cycle). The market
for native Alpha software (which absolutely murdered x86 software) was
near-zero because nobody had the machines, and the entire thing is now just
an interesting footnote in history.

mike3

unread,

Dec 8, 2007, 6:59:29 PM12/8/07

to

On Dec 8, 8:02 am, j...@cix.co.uk (John Dallman) wrote:
<snip>

> You need to demonstrate the benefits, not just talk about them. It's all
> much harder than you seem to think. What you're saying would have made
> perfect sense twenty years ago, but things have changed. This is one of
> the problems with the teaching of computer science these days: the
> commercial market moves too fast for students to stay current.
>

However, is it so hard as to be insurmountable? And even if not, would
there
be any _real_ advantage if some non-x86 architecture were used?

mike3

unread,

Dec 8, 2007, 7:01:28 PM12/8/07

to

On Dec 8, 2:22 pm, "Stephen Sprunk" <step...@sprunk.org> wrote:
> "Hilal" <montha...@gmail.com> wrote in message
<snip>

> Itanic _was_ supposed to be a total replacement for x86; Intel wanted to
> ditch x86 because there was competition there keeping profits down, and they
> wanted to lock everyone into a proprietary architecture so they could charge
> more.
>
> The reality is that once an open standard is developed and achieves market
> dominance, market forces will ensure that a proprietary solution will
> _never_ manage to unseat it, even if it's temporarily superior. x86 isn't
> quite open, due to all the patent licensing, but there's enough suppliers
> that it's close enough for the effect to apply.
>

A *proprietary* standard. What about if one were to develop a new
*open*
standard? Could that replace the old open standard?

Stephen Sprunk

unread,

Dec 9, 2007, 1:55:04 AM12/9/07

to

"mike3" <mike...@yahoo.com> wrote in message
news:e37b2450-33ef-4a33...@s12g2000prg.googlegroups.com...

Only if it were backwards-compatible with the old standard _and_ offered
significant benefits to those that adopted it even if nobody else did.

PowerPC was sort of an attempt to unseat x86 and other CISCs, and while it's
wildly successful in the embedded space, as is ARM, it's failed in the
desktop space.

In contrast, look at the various upgrades that Ethernet has received.
Thicknet to Thinnet to UTP hubs to 100Mb/s to fiber to switching to gigabit
and now even ten gigabit, it's remained completely backwards compatible so
that devices from the 70s can still be connected (with the appropriate
hardware converters) to modern networks -- and ancient networks can be
upgraded piecemeal and receive immediate benefits.

Look at desktop PCs over the last three decades and you'll see that the only
glue that holds the entire industry together is x86 itself. Disks, memory,
graphics, expansion slots, OSes, drivers, I/O ports, etc. all change but the
one thing that has always been constant is the ISA. Even that hasn't been
perfectly constant as Intel and, recently, AMD have made various additions
over the years -- but nobody has ever made _any_ headway at adding a
completely new ISA to the game.

Chris Thomasson

unread,

Dec 9, 2007, 3:06:25 AM12/9/07

to

""Niels Jørgen Kruse"" <nos...@ab-katrinedal.dk> wrote in message
news:1i8srfm.4hhan9gfbi94N%nos...@ab-katrinedal.dk...

Why would IBM think that Apple would eat whatever they fed to them? IBM is
smarter than that.

Chris Thomasson

unread,

Dec 9, 2007, 3:12:16 AM12/9/07

to

"del" <delc...@gmail.com> wrote in message

news:5ruhevF...@mid.individual.net...
> Chris Thomasson wrote:
>> "Paul Gotch" <pa...@at-cantab-dot.net> wrote in message
>> news:jCs*dl...@news.chiark.greenend.org.uk...
>>> Chris Thomasson <cri...@comcast.net> wrote:
>>>> I was wondering when the chip vendors are going to start creating their
>>>> own
>>>> memory that will be directly integrated into their overall processor
>>>> designs.
>>>
>>> There are several issues,
>> [...]
>>>
>>> The second problem is that programming NUMA machines is hard with
>>> languages
>>> designed for coherent shared memory machines. [...]
>>
>> IMHO, its not that hard. You need to learn how to use thread affinity
>> masks to bind them to CPUS that have a high locality of reference wrt the
>> distance between them and the local memory they have access to. You can
>> use C/POSIX, and some platform-specific/assembly-language to program
>> highly efficient NUMA progr
> The answer to your first question is "never" since commodity memory is so
> much cheaper.

Very true. However, I don't think that commodity memory can be directly
integrated into future processor designs. I think that the chip vendor would
have an easier time designing their own dedicated on-processor/per-core
memory systems; think NUMA.

> A quibble is "start" should be "go back to" since IBM at least used to do
> exactly that.

Thanks for that information. IMVHO, I think the chip vendors could possibly
benefit from going back to that line of thinking. How can a independent
memory vendor create special-case integrated designs that fit in with
specific cpu vendors designs, unless the memory and the chips are created by
the same company?

Hilal

unread,

Dec 9, 2007, 6:05:17 AM12/9/07

to

On Dec 8, 4:02 pm, j...@cix.co.uk (John Dallman) wrote:

> montha...@gmail.com (Hilal) wrote:
> > Also can we find a new way to handle data than the undeterministic
> > cache, like creating a local cache inside the cache, where programs
> > can use it as main memory?
>
> Try explaining this again? I don't understand it.

> --
> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

I was looking once at Intel Xscale, which is based on ARM, they had a
feature called creating on-chip RAM, you could suddenly put data there
and know that no conflict can replace it, you would get a small ram in
there.

link : http://www.intel.com/design/intelxscale/273436.htm
Chapter 3.3.2.4 page 55.

I just thought it would be good to have such a feature in a multi-task
environment, so that after each context-switch your data is still in
cache.

Hilal

unread,

Dec 9, 2007, 6:06:10 AM12/9/07

to

On Dec 8, 4:02 pm, j...@cix.co.uk (John Dallman) wrote:
> montha...@gmail.com (Hilal) wrote:

> > Also can we find a new way to handle data than the undeterministic
> > cache, like creating a local cache inside the cache, where programs
> > can use it as main memory?
>
> Try explaining this again? I don't understand it.

> --
> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.

I was looking once at Intel Xscale, which is based on ARM, they had a

Chris Thomasson

unread,

Dec 9, 2007, 7:01:25 AM12/9/07

to

"Hilal" <mont...@gmail.com> wrote in message

news:b946ecd7-c5c2-4d24...@s8g2000prg.googlegroups.com...

> On Dec 8, 4:02 pm, j...@cix.co.uk (John Dallman) wrote:
>> montha...@gmail.com (Hilal) wrote:
>> > Also can we find a new way to handle data than the undeterministic
>> > cache, like creating a local cache inside the cache, where programs
>> > can use it as main memory?
>>
>> Try explaining this again? I don't understand it.
>> --
>> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.
>
> I was looking once at Intel Xscale, which is based on ARM, they had a
> feature called creating on-chip RAM

[...]

This on-chip membanks that are partitioned into the chips partitioned groups
cores:

[16 core NUMA system:]
CPU-GROUP1:1-4=MEMBANK:1
CPU-GROUP2:5-8=MEMBANK:2
CPU-GROUP3:9-12=MEMBANK3
CPU-GROUP4:13-16=MEMBANK:4
[...]

Cache-coherency would be WEAK by default, and VERY WEAK wrt communication
between CPU-GROUPS. Highly-Distributed NUMA is programming model. Memory
barriers and coherency requirements are explicitly programmed to any higher
level than the default! Make a programmable cache-coherency system designed
for NUMA environment; think dedicated FPGA-like logic that -only- deals with
programming details which pertain to the macro/micro cache-coherency issues
within a distributed architecture.

Chris Thomasson

unread,

Dec 9, 2007, 7:03:04 AM12/9/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:c5Kdnbj2jJz5Ssba...@comcast.com...

>
> "Hilal" <mont...@gmail.com> wrote in message
> news:b946ecd7-c5c2-4d24...@s8g2000prg.googlegroups.com...
>> On Dec 8, 4:02 pm, j...@cix.co.uk (John Dallman) wrote:
>>> montha...@gmail.com (Hilal) wrote:
>>> > Also can we find a new way to handle data than the undeterministic
>>> > cache, like creating a local cache inside the cache, where programs
>>> > can use it as main memory?
>>>
>>> Try explaining this again? I don't understand it.
>>> --
>>> John Dallman, j...@cix.co.uk, HTML mail is treated as probable spam.
>>
>> I was looking once at Intel Xscale, which is based on ARM, they had a
>> feature called creating on-chip RAM
>
> [...]
>
> This on-chip membanks that are partitioned into the chips partitioned
> groups cores:

[...]

This should read as:

-Think of- on-chip membanks that are partitioned into the chips partitioned
groups cores:

[...]

Bernd Paysan

unread,

Dec 8, 2007, 6:18:50 PM12/8/07

to

Anton Ertl wrote:
> One question with these ideas is if socketing the CPU limits bandwidth
> significantly, and if we will have to go with soldered CPUs as well as
> soldered RAM eventually.

Soldering CPUs has another advantage: sockets and the corresponding packages
aren't for free. PC prices have been stable in the past, but today, they
are dropping further and further - an entry level Athlon64 starts at ~30
Euro, and a corresponding mainboard at ~40 Euro (and it isn't just the
strong Euro). If you can omit the socketable package and instead drop in a
cheap solderable BGA, you can get even cheaper. I don't know how much these
sockets and the packages cost on top of the cheaper ones, since my chips -
if at all - are only put into sockets when I test engineering samples.

Sometimes we buy expensive sockets for cheap standard packages, sometimes we
bond the engineering samples into easy to socket expensive ceramic packages
(we recycle these packages to reduce the cost impact), and in the last
package, we just soldered the package on the board.

And the main reason for doing so is signal quality. This project is a
high-end audio amplifier, so any sort of noise in the output spoils it.
However, signal quality also is important for high-speed interfaces; that's
why graphic cards with their soldered memory and chip can have a
significantly higher bandwidth than the socketed CPUs connected to RAM
modules. Distance also is important; on graphic cards the RAM is much
closer to the CPU as on the mainboard.

Intel had already demonstrated what they have in mind for the future:
stacked chips where the memory is sitting straight on one side of the CPU.
For more memory surface, I could imagine that you connect the RAM chips
only on one side and stick them vertically onto the CPU (the other side is
for cooling). The remaining pins of the CPU then would be IO and power
connections only.

Since the majority of users don't expand their PCs during the live time
(except peripherals), providing a cheaper offering without that possibility
could be beneficial.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Casper H.S. Dik

unread,

Dec 9, 2007, 8:35:13 AM12/9/07

to

j...@cix.co.uk (John Dallman) writes:

>It has successfully replaced PA-RISC, although it failed to prevent x86
>(Windows and Linux) from taking over the workstation market. However,
>that is about as far as Itanium has got. The big SGI systems have a lot
>of raw power, but they don't seem to be selling many of them now. And HP
>have a big problem if Intel ever gets fed up of Itanium.

Which will likely only be at the point where HP can no longer sue
Intel for many billions for breach of contract.

It may very well be that it will be HP which pulls the plug and not
Intel.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Casper H.S. Dik

unread,

Dec 9, 2007, 8:37:11 AM12/9/07

to

nos...@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?=) writes:

>Apple decided Intels offer was better and IBM overestimated how hard it
>would be for Apple to switch.

And the use of Intel CPUs may also have something to do with an increased
ability to run Windows binaries without translation.

Windows application compatibility will help Apple sell into new markets.

Anton Ertl

unread,

Dec 9, 2007, 8:01:35 AM12/9/07

to

"Stephen Sprunk" <ste...@sprunk.org> writes:
>That doesn't change the point I was making that Apple had to do a
>transition to x86, and HP will be studying that if they know what's good for
>them long-term.

Why should HP study Apple when they have lots of more relevant
experience in-house? I.e., the transitions to HP-PA, the transitions
from VAX and MIPS to Alpha, the transitrion to MIPS in the (Tandem)
NonStop group, the aborted transition of that business to Alpha, and
the transition of HP-PA, Alpha, and MIPS to IA-64.

["del" wrote:]

>> After all, IBM was willing to do chips for Sony, and Microsoft
>
>Sony and Microsoft have significantly higher volume for their respective
>chips than Apple. They also have the advantage they just need a single chip
>made exactly the same (well, lower prices over time would be nice) for
>several years, and after that they iterate again on the next chip. Apple
>needed continual performance increases with a dozen different models being
>sold at any given time, which is a lot more expensive per-chip on the R&D
>side.

Apple needs continual performance increases, Sony and Microsoft need
continual cost reductions; I don't think that there's that much
difference in development costs resulting from these two goals.

As for the "dozens of models", Intel does not do that, either; they
have one microarchitecture, and derive a few power/performance levels
from that by varying process parameters and probably some circuit
design options, the rest is done by binning and different pin-outs. I
don't think Intel spends that much on these things, and Apple does not
even use all the levels that Intel builds.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Niels Jørgen Kruse

unread,

Dec 9, 2007, 9:04:54 AM12/9/07

to

Casper H.S. Dik <Caspe...@Sun.COM> wrote:

> nos...@ab-katrinedal.dk (=?ISO-8859-1?Q?Niels_J=F8rgen_Kruse?=) writes:
>
> >Apple decided Intels offer was better and IBM overestimated how hard it
> >would be for Apple to switch.
>
> And the use of Intel CPUs may also have something to do with an increased
> ability to run Windows binaries without translation.
>
> Windows application compatibility will help Apple sell into new markets.

That ability has no doubt helped the increase in market share, but it is
not obvious that it was viewed as a net advantage when the decision was
made. Consider the fate of OS/2.

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

John Dallman

unread,

Dec 9, 2007, 10:10:00 AM12/9/07

to

In article
<fe9f237a-8aa5-4f5c...@w28g2000hsf.googlegroups.com>,
mont...@gmail.com (Hilal) wrote:

> On Dec 8, 4:02 pm, j...@cix.co.uk (John Dallman) wrote:
> > montha...@gmail.com (Hilal) wrote:
> > > Also can we find a new way to handle data than the undeterministic
> > > cache, like creating a local cache inside the cache, where
> > > programs can use it as main memory?
> > Try explaining this again? I don't understand it.

> I was looking once at Intel Xscale, which is based on ARM, they had a
> feature called creating on-chip RAM, you could suddenly put data there
> and know that no conflict can replace it, you would get a small ram in
> there.

> ...

> I just thought it would be good to have such a feature in a multi-task
> environment, so that after each context-switch your data is still in
> cache.

What happens if several programs want to do this? The trouble with such
features is that once you have more than one task plus the OS, they
become a source of problems, as you have to virtualise them, adding
complexity.

John Dallman

unread,

Dec 9, 2007, 10:10:00 AM12/9/07

to

In article <2007Dec...@mips.complang.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> Why should HP study Apple when they have lots of more relevant
> experience in-house? I.e., the transitions to HP-PA, the transitions
> from VAX and MIPS to Alpha, the transitrion to MIPS in the (Tandem)
> NonStop group, the aborted transition of that business to Alpha, and
> the transition of HP-PA, Alpha, and MIPS to IA-64.

Absolutely. They do have a potential problem, though, with motivating
ISVs to convert *again*. What would do you feel would be the major
customer advantages of HP-UX on x86-64 over Solaris or Linux on x86-64?
Because there aren't any that are obvious to me.

Anton Ertl

unread,

Dec 9, 2007, 10:58:25 AM12/9/07

to

j...@cix.co.uk (John Dallman) writes:
[potential HP migration from IA-64 to x86-64]

>Absolutely. They do have a potential problem, though, with motivating
>ISVs to convert *again*. What would do you feel would be the major
>customer advantages of HP-UX on x86-64 over Solaris or Linux on x86-64?
>Because there aren't any that are obvious to me.

What are the advantages of HP-UX on IA-64 over Linux on IA-64 or
x86-64? None that would convince me. But I guess there are some for
other people:

- Probably old HP/UX binaries will only run on the new hardware under
HP/UX.

- Sysadmins that are using HP/UX exclusively will find HP/UX more
comfortable on any hardware.

But none of that will gain new customers, and a percentage of the
existing ones will be lost on every purchase, and more on every
migration, so the future for HP/UX looks bleak.

BTW, HP/UX is not the only platform they would have to migrate:
There's also VMS, NonStop, and what remains of the Tru64 business; MPE
seems to be pretty dead.

John Dallman

unread,

Dec 9, 2007, 11:26:00 AM12/9/07

to

In article
<a8f15b5d-e367-4ac6...@e25g2000prg.googlegroups.com>,
mike...@yahoo.com (mike3) wrote:

> However, is it so hard as to be insurmountable?

I don't know.

I do observe that most ISAs that are designed are commercial failures,
but that if they get established, there are network effects that tend to
keep them going and competitive, which don't simply depend on their
qualities as abstract ISAs. See the IBM 360 through to Z series for an
example.

Note that of the RISC architectures of the eighties and early nineties,
PowerPC is the only one still going strong in general- purpose
computing. Alpha is circling the drain, MIPS is confined to specific
embedded markets, and SPARC is evolving towards a specialised
many-thread engine.

> And even if not, would there be any _real_ advantage if some non-x86
> architecture were used?

It isn't obvious to me that an ISA can be designed that would present a
compelling case as a replacement for x86. Its problems have been more or
less overcome, because - due to the network effects - there was so much
money to be made by doing so. As I said above, I suspect that something
very different indeed may be required to take advantage of growing
transistor counts, something that isn't much like any of the processor
we know today. I don't know what it would look like.

John Dallman

unread,

Dec 9, 2007, 11:35:00 AM12/9/07

to

In article <2007Dec...@mips.complang.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> BTW, HP/UX is not the only platform they would have to migrate:
> There's also VMS, NonStop, and what remains of the Tru64 business; MPE
> seems to be pretty dead.

They seem determined to migrate the Tru64 business onto HP-UX: there's
no Tru64 for Itanium. You can still buy Alpha hardware from them, last
time I looked, but it's expensive.

The costs of another CPU migration "soon" after the PA-RISC to Itanium
one could also be a problem for HP; of course, if they start it while
they're still working on Itanium, they don't have to re-learn how to do
it. But would that argument convince their financial management? And
what would happen to their sales when the plan for another CPU
transition leaked?

already...@yahoo.com

unread,

Dec 9, 2007, 1:01:54 PM12/9/07

to

On Dec 9, 5:10 pm, j...@cix.co.uk (John Dallman) wrote:
> In article
> <fe9f237a-8aa5-4f5c-9b03-cbacf0c99...@w28g2000hsf.googlegroups.com>,

Exactly. Non-transparent on-chip memory is very useful in embedded
systems but doesn't fit "virtualize everything" paradigm prevalent in
present-days general-purpose computing.

The same applies to on-chip user-mode programmable DMA engines. Even
if one manges to build them in the most OS-friendly manner, i.e.
operating on virtual addresses, honoring memory protection, demand
paging and not causing problems with cache coherence. Still there is
the same unsurmountable problem - when you have only few resources of
particular type you can't virtualize them in efficients manner.

Del Cecchi

unread,

Dec 9, 2007, 2:52:00 PM12/9/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message

news:deednRm9t_0tPMba...@comcast.com...

If you are talking about "embedded memory" or PIM or something, IBM makes
embedded dram. Intel does not. Rumors of something called zram from
amd.

If you are talking about main memory, that is unlikely to be on chip,
since size is too big.

del

mike3

unread,

Dec 9, 2007, 2:58:21 PM12/9/07

to

On Dec 9, 9:26 am, j...@cix.co.uk (John Dallman) wrote:
> In article
> <a8f15b5d-e367-4ac6-886a-a46e680f8...@e25g2000prg.googlegroups.com>,

>
> mike4...@yahoo.com (mike3) wrote:
> > However, is it so hard as to be insurmountable?
>
> I don't know.
>
> I do observe that most ISAs that are designed are commercial failures,
> but that if they get established, there are network effects that tend to
> keep them going and competitive, which don't simply depend on their
> qualities as abstract ISAs. See the IBM 360 through to Z series for an
> example.
>
> Note that of the RISC architectures of the eighties and early nineties,
> PowerPC is the only one still going strong in general- purpose
> computing. Alpha is circling the drain, MIPS is confined to specific
> embedded markets, and SPARC is evolving towards a specialised
> many-thread engine.
>
> > And even if not, would there be any _real_ advantage if some non-x86
> > architecture were used?
>
> It isn't obvious to me that an ISA can be designed that would present a
> compelling case as a replacement for x86. Its problems have been more or
> less overcome, because - due to the network effects - there was so much
> money to be made by doing so. As I said above, I suspect that something
> very different indeed may be required to take advantage of growing
> transistor counts, something that isn't much like any of the processor
> we know today. I don't know what it would look like.
>

Do any of the already-existing non-x86 ISAs out there present any
really significant advantage in performance as general-purpose
processors?

del

unread,

Dec 10, 2007, 7:57:12 PM12/10/07

to

Anne & Lynn Wheeler wrote:
> Paul Gotch <pa...@at-cantab-dot.net> writes:
>> Indeed IBMs methadology is heavily based round a standard cell library as
>> much automation as they can get away with. So they trade some performance and
>> power efficency for fewer engineers and faster turnaround time.
>>
>> Getting a really low power chip requires full custom layout and and tight
>> process. Or something semicustom and not mainstream like
>> http://www.intrinsity.com/ the headroom you get out of using something like
>> Fast14 does depend critically on exactly how the core was designed to start
>> off with though.
>
> in (at least) the 70s and 80s there were quite a few internal people
> working on VLSI design tools. I saw some amount doing work with the Los
> Gatos VLSI group. In fact, the Los Gatos VLSI tools group originally
> created the corporate pascal compiler in support of their VLSI tool
> development (which was eventually released as product on both mainframe
> and RISC).
>
> in the wake of the company going thru its downturn, including going into
> the red in 1992 ... quite a bit of the internal tool groups were
> dissolved and transition made to standard vendor tools. during this
> period, there were some number of people from VSLI tool groups doing
> technology transfer to outside vendors ... as sort of last task prior to
> their position being eliminated (some then showed up at as employees of
> the vendors they had been working with).
>
> In the case of the Los Gatos group ... the bldg was leveled and the land
> sold off for housing development.

It was a beautiful place, even if their tools didn't work very well.

del

unread,

Dec 10, 2007, 8:01:36 PM12/10/07

to

Hilal wrote:
> On Dec 8, 1:33 pm, Erik Trulsson <ertr1...@student.uu.se> wrote:
>> Going to some better ISA would only give minor performance benefits - even
>> assuming you could match Intel's and AMD's process technology in other aspects.
>> To keep this other ISA being faster than x86, you will also have to match Intel's
>> and AMD's R&D to keep up with their improvements in process technology - or persuade
>> both of them to switch to this new ISA. Intel tried switching ISA recently (to Itanium).
>> One reason why they failed was because AMD kept improving their x86
>> offerings meanwhile, meaning customers didn't have any reason to stop using x86.
>>
>
> Itanium wasn't supposed to be a desktop PC, because translation was
> slow in the beginning, which is solved by this solution.

> The biggest question that came up here is in what way will this new
> ISA be better?
>
> 1: will it really give performance compared to x86?

> 2: Can we make it fast enough as x86 but much simpler so that we can
> have a combination: MORE cores, BIGGER caches, MANY registers?
>

> Also can we find a new way to handle data than the undeterministic
> cache, like creating a local cache inside the cache, where programs
> can use it as main memory?
>

> Or we can use this big caches for fast context switching?

>
> there are endless opportunities I see here! By making one thing
> simpler and expanding on the things where it counts. Apparently many
> of you think that ISA is not the biggest bottleneck, and I don't
> disagree, but then what if we made it a bit "slower" and used the area
> we get boost something else? And as the same time we are fazing out
> x86!
>

>> The main reason x86 currently is so successful on the desktop has nothing to
>> do with performance. It is because there is such a large number of
>> applications that are only available for x86 (mostly running Microsoft Windows.)
>>
>> To dislodge the x86 from the desktop you will need some way to have all these
>> applications (or superior ones) run (and run well) on whatever system is supposed
>> to replace x86.
>>
>
> Thats the idea, you don't need to recompile everything! It is like
> running 32bit in 64bit. Recompile only what you need, this will give
> it a dedicated CORE!!!
> The problem is in the proprietary software you need to wait for them,
> while open source will recompile much faster.
>

>> --
>> <Insert your favourite quote here.>
>> Erik Trulsson
>> ertr1...@student.uu.se
>
> best regards,

Gee, I love Historical Revisionism. Student dude, go into the depths of
technical publications and Usenet. IA64 was indeed intended to replace
x86 in both the server and client applications. Anyone who tells you
different is engaging in the same sort of revision of history practiced
by the former soviet union when they would paint people out of
photographs or cut pages out of books.

Andrew Reilly

unread,

Dec 9, 2007, 9:33:45 PM12/9/07

to

You (someone, anyway) keep asking that question, and you keep getting the
same answer. Is there some other way that you could ask the question
that might produce a different answer?

To answer it again, here: clearly not.

For desktops, anyway. "General purpose" is a pretty broad church.

For "general purpose" portable, battery-powered computing devices
(phones, touch-pads, e-books and the like) it seems as though an ARM or
Renesas RISC of some sort might offer a bit of a power advantage
(although after seeing how Intel responded to TransMeta, I wouldn't be
putting a *lot* of my own money on that bet, long term.)

For "general purpose" server applications, it looks as though PowerPC,
x86-64, SPARC and Itanium can all be persuaded to be roughly competitive,
no doubt because system costs dominate processor costs at that scale.

How significant is "significant", and what do you define as "general
purpose"? If you're considering desktops, how do you differentiate
between work done by the ISA and work off-loaded to the GPU, or to the
TOE processor in the ethernet controller, or even to the SIMD arithmetic
units? Is x86-64 a different ISA to ia32?

Cheers,

--
Andrew

Chris Thomasson

unread,

Dec 10, 2007, 12:00:22 AM12/10/07

to

"Del Cecchi" <delcecchi...@gmail.com> wrote in message
news:5s2vaqF...@mid.individual.net...

>
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:deednRm9t_0tPMba...@comcast.com...
>> "del" <delc...@gmail.com> wrote in message
>> news:5ruhevF...@mid.individual.net...
>>> Chris Thomasson wrote:
>>>> "Paul Gotch" <pa...@at-cantab-dot.net> wrote in message
>>>> news:jCs*dl...@news.chiark.greenend.org.uk...
>>>>> Chris Thomasson <cri...@comcast.net> wrote:
>>>>>> I was wondering when the chip vendors are going to start creating
>>>>>> their own
>>>>>> memory that will be directly integrated into their overall processor
>>>>>> designs.
>>>>>
>>>>> There are several issues,
>>>> [...]

[...]

> If you are talking about main memory, that is unlikely to be on chip,
> since size is too big.

http://groups.google.com/group/comp.arch/msg/c9d99ae1251f2462

Is that crap?

Quadibloc

unread,

Dec 10, 2007, 4:09:59 AM12/10/07

to

Hilal wrote:
> <Research>:
> What we need to make this work is a simple operating system that can
> detect the different binaries and assigning them to the right core.

Actually, no research is needed.

The only reason people are insisting on keeping the x86 architecture
is to run programs written for Microsoft Windows.

Programs written to run on a different ISA can just be *labelled* to
indicate which ISA they should run on. The Macintosh has been able, in
this way, to sort out programs for the 68000 and for the PowerPC, or
programs for the PowerPC and the x86, for quite a while.

So all you need, say, is for files to end in ".ex2" instead of ".exe"
or ".com" to be directed to the right core. Advanced technology to try
and *figure out* which ISA a program uses would just add overhead and
waste cycles.

Of course, there might be specialized applications where this would
make sense, such as machines designed to run legacy software from
multiple sources. But where the only legacy software comes from *one*
ISA, having the software for the new ISA being suitably flagged is the
obvious solution which we already know how to do.

John Savard

Quadibloc

unread,

Dec 10, 2007, 4:11:10 AM12/10/07

to

Hilal wrote:
> This is just the tip of the ice berg. Why not make a CPU that has 3-4
> different ISA? Stack-based, Register-Register and maybe VLIW? And let
> the compiler choose the best ISA for a specific application?

I've been thinking about that sort of thing too...

http://www.quadibloc.com/arch/arcint.htm

John Savard

Quadibloc

unread,

Dec 10, 2007, 4:14:26 AM12/10/07

to

On Dec 6, 12:26 pm, Benny Amorsen <benny+use...@amorsen.dk> wrote:
> "Paul A. Clayton" <paaronclay...@earthlink.net> writes:
>
> > I suspect a significant number CS-types would heartily approve of the
> > death of x86, seeing the x86 ISA as an inelegant collection of ISA
> > extensions--unnecessarily difficult to decode, excessively limited in
> > its ability to express ILP (2 register operations, modest register
> > count), with support for unnecessarily complex operations.
>
> x86 code is dense. That's a large advantage for general-purpose code.
> Are there any RISCs that are as dense, apart from ARM with Thumb?

RISC indeed isn't very dense, but x86 is not a particlularly clean
CISC. So the ideal would be a clean CISC, like the 68000 or z/
Architecture - the latter being built on a legacy dating back to 1964
(System/360), but a legacy that was already 32-bit and hardware
floating-point, not an 8-bit legacy.

John Savard

Ken Hagan

unread,

Dec 10, 2007, 5:42:56 AM12/10/07

to

On Sat, 08 Dec 2007 18:39:06 -0000, Paul Gotch <pa...@at-cantab-dot.net>
wrote:

> Personally I think the only thing that's going to get rid of x86 desktop
> PCs is the desktop PC going the way of the dodo. This is not as far
> fetched
> as you think. Staying in the x86 space the number of laptops shipped has
> been increasing year on year for a while now.

But those laptops are running exactly the same software as the desktops
they replace. The only thing that has changed is the case. In a few years
time, you'll be telling me that the only thing that will dislodge x86 from
the laptop is...

Torben Ægidius Mogensen

unread,

Dec 10, 2007, 6:56:29 AM12/10/07

to

ken...@cix.compulink.co.uk writes:

> In article <5rojl9F...@mid.individual.net>,
> andrew-...@areilly.bpc-users.org (Andrew Reilly) wrote:
>
>> Back in the day, BBC micros sometimes ran with different-architecture
>> co- processors. I don't know how shared the OS environment was.
>
> From memory the Z80 co-processor ran CPM on it's own memory (up to 64K)
> with the rest of the machine being reduced to an I/O processor. You
> could change the OS of the BBC by sticking in extension ROM. I seem to
> remember 6502 and 6809 add on boards being available as well.

There were also Intel 80186 and Zilog 32016 coprocessors available,
the first running DOS and the second an OS of Acorn's own design
called Panos. In all cases, the host 6502 (as Kenney indicated)
served as i/o and video processor. In particular, all screen updates
were handled by the host processor, freeing up both time and memory
for the co-processors. An ARM coprocessor was also made, partly for
in-house development and partly as an evaluation bord for ARM.

The interface between the host and co-processor was called "The Tube".

Torben

Torben Ægidius Mogensen

unread,

Dec 10, 2007, 7:04:53 AM12/10/07

to

nos...@ab-katrinedal.dk (Niels Jørgen Kruse) writes:

> Benny Amorsen <benny+...@amorsen.dk> wrote:
>
>> x86 code is dense. That's a large advantage for general-purpose code.
>> Are there any RISCs that are as dense, apart from ARM with Thumb?
>

> The difference is not all that. I checked the sizes of a few universal
> binaries:
> ppc i386 % ppc/i386
> EyeTV 16239092 15764404 3.01
> iTunes 14873344 14577664 2.03
> MacSOUP 936764 838868 11.7
> Camino 13597232 13674444 -0.565
> Cyberduck 58320 59608 -2.16

It used to be the case that RISC code was larger than CISC, but that
is mostly no longer true. This is in part because RISC processors
have added features that make the code smaller and partly because the
compilers are becoming more similar. Also, I wouldn't call x86 code
compact -- that might have been true for pure 16-bit code, but the
32-bit additions added prefix bytes to instructions which lowered the
density. Also, with fewer registers, more things need to be stored in
memory, which required more instructions.

Even before Thumb, ARM had code density competitive with x86, and with
later additions (not just Thumb, but also 16-bit memory transfers and
other special-case instructions) became moreso.

Torben

Spoon

unread,

Dec 10, 2007, 7:24:19 AM12/10/07

to

Mitch Alsup wrote:

> Even integer division(s) in the most modern AMD chips operates
> without microcode.

The Software Optimization Guide for AMD Family 10h Processors has DIV
and IDIV as VectorPath instructions. I had always thought VectorPath
meant microcoded?

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/40546.pdf

del

unread,

Dec 11, 2007, 9:39:27 AM12/11/07

to

Well, it is science fiction at the moment. There have been proposals
for Wafer Scale Integration, and Gene Amdahl took a run at it back in
the day, but it hasn't been around much lately.

It is much more cost effective to buy 1GB dimms for 25 dollars.

Nick Maclaren

unread,

Dec 10, 2007, 9:52:44 AM12/10/07

to

In article <5s51egF...@mid.individual.net>,
del <delc...@gmail.com> writes:

What is far more likely is module-based integration, where multiple
chips are packaged as a unit, including CPU and memory, as Bernd Paysan
points out. Sun are working on that, and I should be flabberghasted
if IBM and others weren't, too. I don't know anything about how you
transport unpackaged chips, but I am sure that a mechanism could be
designed.

For example, two power/cooling pads top and bottom, and the edge pins
entirely for peripherals. Yes, the boards would have to have holes
in, but that isn't insoluble :-)

Regards,
Nick Maclaren.

Paul Gotch

unread,

Dec 10, 2007, 10:00:49 AM12/10/07

to

Ken Hagan <K.H...@thermoteknix.co.uk> wrote:
> But those laptops are running exactly the same software as the desktops
> they replace.

Um I did say "staying in the x86 space" then my second paragraph described
mobile phones and MID devices.

The thrust of my statement is that people are abandoning desktop machines
and the perceived upgradeability and expandability that they represent for
portable machines.

It is then a much smaller step to go from a sub notebook to something like a
Nokia N810.

-p
--
"Unix is user friendly, it's just picky about who its friends are."
- Anonymous
--------------------------------------------------------------------